$\mu$BERT: Mutation Testing using Pre-Trained Language Models

Renzo Degiovanni; Mike Papadakis

arXiv:2203.03289·cs.SE·March 8, 2022·1 cites

$\mu$BERT: Mutation Testing using Pre-Trained Language Models

Renzo Degiovanni, Mike Papadakis

PDF

Open Access 1 Repo

TL;DR

$BERT is a mutation testing tool leveraging pre-trained language models to generate mutants, demonstrating improved fault detection and cost-effectiveness over traditional methods like PiTest.

Contribution

It introduces a novel mutation testing approach using CodeBERT for mutant generation, enhancing fault detection and cost efficiency.

Findings

01

Detects 27 out of 40 real faults, outperforming PiTest's 26.

02

Achieves twice the cost-effectiveness compared to PiTest.

03

Produces mutants that improve program assertion inference and specification quality.

Abstract

We introduce $μ$ BERT, a mutation testing tool that uses a pre-trained language model (CodeBERT) to generate mutants. This is done by masking a token from the expression given as input and using CodeBERT to predict it. Thus, the mutants are generated by replacing the masked tokens with the predicted ones. We evaluate $μ$ BERT on 40 real faults from Defects4J and show that it can detect 27 out of the 40 faults, while the baseline (PiTest) detects 26 of them. We also show that $μ$ BERT can be 2 times more cost-effective than PiTest, when the same number of mutants are analysed. Additionally, we evaluate the impact of $μ$ BERT's mutants when used by program assertion inference techniques, and show that they can help in producing better specifications. Finally, we discuss about the quality and naturalness of some interesting mutants produced by $μ$ BERT during our experimental…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rdegiovanni/mbert
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques