TL;DR
This paper introduces Flakify, a novel black-box, language model-based predictor for flaky test cases that relies solely on test case source code, eliminating the need for project-specific features or production code access.
Contribution
Flakify is the first approach to use a pre-trained language model fine-tuned on test code for flaky test prediction without requiring feature engineering or production code.
Findings
Flakify outperforms FlakeFlagger in precision and recall by 10-18 percentage points.
It achieves high F1-scores on two public datasets with cross-validation and per-project validation.
Black-box FlakeFlagger is ineffective for flaky test prediction.
Abstract
Software testing assures that code changes do not adversely affect existing functionality. However, a test case can be flaky, i.e., passing and failing across executions, even for the same version of the source code. Flaky test cases introduce overhead to software development as they can lead to unnecessary attempts to debug production or testing code. The state-of-the-art ML-based flaky test case predictors rely on pre-defined sets of features that are either project-specific, require access to production code, which is not always available to software test engineers. Therefore, in this paper, we propose Flakify, a black-box, language model-based predictor for flaky test cases. Flakify relies exclusively on the source code of test cases, thus not requiring to (a) access to production code (black-box), (b) rerun test cases, (c) pre-define features. To this end, we employed CodeBERT, a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
