Metamorphic Testing of Deep Code Models: A Systematic Literature Review
Ali Asgari, Milan de Koning, Pouria Derakhshanfar, Annibale Panichella

TL;DR
This paper systematically reviews how metamorphic testing is applied to evaluate the robustness of deep code models, analyzing 45 studies to identify current practices, challenges, and future research directions.
Contribution
It provides a comprehensive synthesis of metamorphic testing techniques for deep code models, highlighting common transformations, evaluation methods, and research gaps.
Findings
Metamorphic testing is widely used to evaluate deep code model robustness.
Common transformations include variable renaming and code refactoring.
Future research should address challenges in evaluation metrics and diverse programming languages.
Abstract
Large language models and deep learning models designed for code intelligence have revolutionized the software engineering field due to their ability to perform various code-related tasks. These models can process source code and software artifacts with high accuracy in tasks such as code completion, defect detection, and code summarization; therefore, they can potentially become an integral part of modern software engineering practices. Despite these capabilities, robustness remains a critical quality attribute for deep-code models as they may produce different results under varied and adversarial conditions (e.g., variable renaming). Metamorphic testing has become a widely used approach to evaluate models' robustness by applying semantic-preserving transformations to input programs and analyzing the stability of model outputs. While prior research has explored testing deep learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
