Large Language Models for Equivalent Mutant Detection: How Far Are We?
Zhao Tian, Honglin Shu, Dong Wang, Xuejie Cao, Yasutaka Kamei, Junjie, Chen

TL;DR
This study empirically evaluates large language models for detecting equivalent mutants in Java code, showing they outperform existing methods with a good balance of cost and accuracy.
Contribution
First comprehensive empirical analysis of LLMs for equivalent mutant detection, demonstrating their superior performance and efficiency over traditional techniques.
Findings
LLM-based techniques outperform existing EMD methods by 35.69% in F1-score.
Fine-tuned code embeddings are the most effective LLM strategy.
LLMs offer a good balance between detection effectiveness and computational cost.
Abstract
Mutation testing is vital for ensuring software quality. However, the presence of equivalent mutants is known to introduce redundant cost and bias issues, hindering the effectiveness of mutation testing in practical use. Although numerous equivalent mutant detection (EMD) techniques have been proposed, they exhibit limitations due to the scarcity of training data and challenges in generalizing to unseen mutants. Recently, large language models (LLMs) have been extensively adopted in various code-related tasks and have shown superior performance by more accurately capturing program semantics. Yet the performance of LLMs in equivalent mutant detection remains largely unclear. In this paper, we conduct an empirical study on 3,302 method-level Java mutant pairs to comprehensively investigate the effectiveness and efficiency of LLMs for equivalent mutant detection. Specifically, we assess…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Biomedical Text Mining and Ontologies · Genetics, Bioinformatics, and Biomedical Research
