A Large Language Model Approach to Identify Flakiness in C++ Projects

Xin Sun; Daniel St{\aa}hl; Kristian Sandahl

arXiv:2412.12340·cs.SE·June 9, 2025

A Large Language Model Approach to Identify Flakiness in C++ Projects

Xin Sun, Daniel St{\aa}hl, Kristian Sandahl

PDF

Open Access

TL;DR

This paper presents an approach using large language models to identify flaky tests in C++ projects, improving debugging efficiency by accurately classifying flaky tests at the code level.

Contribution

It introduces a fine-tuned LLM-based method for detecting flaky tests in C++ and Java, with comprehensive evaluation and practical recommendations.

Findings

01

Mistral-7b outperforms other models on all metrics

02

Models perform comparably on C++ and Java datasets

03

LLMs demonstrate high accuracy in classifying flaky tests

Abstract

The role of regression testing in software testing is crucial as it ensures that any new modifications do not disrupt the existing functionality and behaviour of the software system. The desired outcome is for regression tests to yield identical results without any modifications made to the system being tested. In practice, however, the presence of Flaky Tests introduces non-deterministic behaviour and undermines the reliability of regression testing results. In this paper, we propose an LLM-based approach for identifying the root cause of flaky tests in C++ projects at the code level, with the intention of assisting developers in debugging and resolving them more efficiently. We compile a comprehensive collection of C++ project flaky tests sourced from GitHub repositories. We fine-tune Mistral-7b, Llama2-7b and CodeLlama-7b models on the C++ dataset and an existing Java dataset and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Software Engineering Techniques and Practices