A Dataset of Reproducible Flaky-Test Failures
Suzzana Rafi, Mahbub-Ul-Hoque Sumon, Md Erfan, Maruf Morshed Khan, August Shi, and Wing Lam

TL;DR
This paper introduces ReproFlake, a comprehensive dataset of 1115 reproducible flaky tests with environment setup, reproduction scripts, fixes, and logs, aiding research on flaky test detection and repair.
Contribution
ReproFlake is the first dataset providing reproducible environments, scripts, fixes, and logs for flaky tests, facilitating better understanding and handling of flaky test failures.
Findings
Error information aids in identifying flaky test categories and repairs.
Unresolved compilation failures reveal challenges in legacy projects.
Knowing fix locations can prioritize repair efforts.
Abstract
Flaky tests pass and fail non-deterministically when run on the same version of code. Although many techniques have been proposed to detect, debug, and repair flaky tests, reproducing their failures remains a major challenge due to their inherent nondeterminism. Many flaky test datasets exist to help researchers study them, but these datasets are often composed of disjoint sets of flaky tests, where each dataset provides unique information, such as flaky tests of different categories, failure logs of flaky tests, or flaky tests reported by developers vs. flaky tests found by automated tools. In this work, we aim to create a reproducible dataset of flaky tests, curated from both developer issue reports and a popular dataset of flaky tests. Compared to prior flaky test datasets, our dataset is the first to provide (1) a reproducible environment to compile flaky tests, (2) scripts to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
