We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
C-CORE: Clustering by Code Representation to Prioritize Test Cases in Compiler Testing.
- Authors
Wei Zhou; Xincong Jiang; Chuan Qin
- Abstract
Edge devices, due to their limited computational and storage resources, often require the use of compilers for program optimization. Therefore, ensuring the security and reliability of these compilers is of paramount importance in the emerging field of edge AI. One widely used testing method for this purpose is fuzz testing, which detects bugs by inputting random test cases into the target program. However, this process consumes significant time and resources. To improve the efficiency of compiler fuzz testing, it is common practice to utilize test case prioritization techniques. Some researchers usemachine learning to predict the code coverage of test cases, aiming to maximize the test capability for the target compiler by increasing the overall predicted coverage of the test cases. Nevertheless, these methods can only forecast the code coverage of the compiler at a specific optimization level, potentially missing many optimization-related bugs. In this paper, we introduce C-CORE (short for Clustering by Code Representation), the first framework to prioritize test cases according to their code representations, which are derived directly from the source codes. This approach avoids being limited to specific compiler states and extends to a broader range of compiler bugs. Specifically, we first train a scaled pre-trained programming language model to capture as many common features as possible from the test cases generated by a fuzzer. Using this pre-trained model, we then train two downstreammodels: one for predicting the likelihood of triggering a bug and another for identifying code representations associated with bugs. Subsequently, we cluster the test cases according to their code representations and select the highest-scoring test case fromeach cluster as the high-quality test case. This reduction in redundant testing cases leads to time savings. Comprehensive evaluation results reveal that code representations are better at distinguishing test capabilities, and C-CORE significantly enhances testing efficiency. Across four datasets, C-CORE increases the average of the percentage of faults detected (APFD) value by 0.16 to 0.31 and reduces test time by over 50% in 46% of cases. When compared to the best results fromapproaches using predicted code coverage, C-CORE improves the APFD value by 1.1% to 12.3% and achieves an overall time-saving of 159.1%.
- Subjects
LANGUAGE models; COMPILERS (Computer programs); PROGRAMMING languages; LEAD time (Supply chain management); SOURCE code; COMPUTER programming education
- Publication
CMES-Computer Modeling in Engineering & Sciences, 2024, Vol 139, Issue 2, p2069
- ISSN
1526-1492
- Publication type
Article
- DOI
10.32604/cmes.2023.043248