What is the Vocabulary of Flaky Tests? An Extended Replication
B. H. P. Camara, M. A. G. Silva, A. T. Endo, S. R. Vergilio

TL;DR
This study empirically investigates the use of code identifiers to predict flaky tests, replicating prior work with different tools and validating across multiple datasets, revealing consistent results but variable recall performance.
Contribution
It extends previous research by replicating flaky test prediction with new ML tools and validating models across diverse datasets and projects.
Findings
Replicated previous flaky test prediction results with minor differences.
Different ML algorithms showed similar performance to prior methods.
Recall decreased in cross-project validation, indicating challenges in generalization.
Abstract
Software systems have been continuously evolved and delivered with high quality due to the widespread adoption of automated tests. A recurring issue hurting this scenario is the presence of flaky tests, a test case that may pass or fail non-deterministically. A promising, but yet lacking more empirical evidence, approach is to collect static data of automated tests and use them to predict their flakiness. In this paper, we conducted an empirical study to assess the use of code identifiers to predict test flakiness. To do so, we first replicate most parts of the previous study of Pinto~et~al.~(MSR~2020). This replication was extended by using a different ML Python platform (Scikit-learn) and adding different learning algorithms in the analyses. Then, we validated the performance of trained models using datasets with other flaky tests and from different projects. We successfully…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Software Reliability and Analysis Research
