Do Automatic Test Generation Tools Generate Flaky Tests?
Martin Gruber, Muhammad Firhard Roslan, Owain Parry, Fabian, Scharnb\"ock, Phil McMinn, Gordon Fraser

TL;DR
This study investigates whether automated test generation tools produce flaky tests, finds they do at similar rates as human-written tests, and evaluates suppression mechanisms that significantly reduce flakiness.
Contribution
It provides the first large-scale analysis of flaky tests from test generation tools, comparing their causes and effectiveness of suppression techniques.
Findings
Generated flaky tests are as common as developer-written ones.
Existing suppression mechanisms reduce flaky tests by 71.7%.
Causes of generated flaky tests differ, with more due to randomness.
Abstract
Non-deterministic test behavior, or flakiness, is common and dreaded among developers. Researchers have studied the issue and proposed approaches to mitigate it. However, the vast majority of previous work has only considered developer-written tests. The prevalence and nature of flaky tests produced by test generation tools remain largely unknown. We ask whether such tools also produce flaky tests and how these differ from developer-written ones. Furthermore, we evaluate mechanisms that suppress flaky test generation. We sample 6 356 projects written in Java or Python. For each project, we generate tests using EvoSuite (Java) and Pynguin (Python), and execute each test 200 times, looking for inconsistent outcomes. Our results show that flakiness is at least as common in generated tests as in developer-written tests. Nevertheless, existing flakiness suppression mechanisms implemented in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Testing and Debugging Techniques · Software Engineering Research · Software System Performance and Reliability
