A Large Language Model Approach to Identify Flakiness in C++ Projects
Xin Sun, Daniel St{\aa}hl, Kristian Sandahl

TL;DR
This paper presents an approach using large language models to identify flaky tests in C++ projects, improving debugging efficiency by accurately classifying flaky tests at the code level.
Contribution
It introduces a fine-tuned LLM-based method for detecting flaky tests in C++ and Java, with comprehensive evaluation and practical recommendations.
Findings
Mistral-7b outperforms other models on all metrics
Models perform comparably on C++ and Java datasets
LLMs demonstrate high accuracy in classifying flaky tests
Abstract
The role of regression testing in software testing is crucial as it ensures that any new modifications do not disrupt the existing functionality and behaviour of the software system. The desired outcome is for regression tests to yield identical results without any modifications made to the system being tested. In practice, however, the presence of Flaky Tests introduces non-deterministic behaviour and undermines the reliability of regression testing results. In this paper, we propose an LLM-based approach for identifying the root cause of flaky tests in C++ projects at the code level, with the intention of assisting developers in debugging and resolving them more efficiently. We compile a comprehensive collection of C++ project flaky tests sourced from GitHub repositories. We fine-tune Mistral-7b, Llama2-7b and CodeLlama-7b models on the C++ dataset and an existing Java dataset and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Engineering Techniques and Practices
