Understanding Bug-Reproducing Tests: A First Empirical Study
Andre Hora, Gordon Fraser

TL;DR
This empirical study analyzes 642 bug-reproducing tests in Python systems, revealing they are similar to other tests but contain more try/except blocks and weak assertions, with most tests reproducing a single bug.
Contribution
First empirical analysis of bug-reproducing tests, highlighting their properties and differences from regular tests in real-world Python projects.
Findings
Bug-reproducing tests are similar to other tests in size and complexity.
They contain more try/except blocks and weak assertions.
Majority reproduce a single bug, few reproduce multiple bugs.
Abstract
Developers create bug-reproducing tests that support debugging by failing as long as the bug is present, and passing once the bug has been fixed. These tests are usually integrated into existing test suites and executed regularly alongside all other tests to ensure that future regressions are caught. Despite this co-existence with other types of tests, the properties of bug-reproducing tests are scarcely researched, and it remains unclear whether they differ fundamentally. In this short paper, we provide an initial empirical study to understand bug-reproducing tests better. We analyze 642 bug-reproducing tests of 15 real-world Python systems. Overall, we find that bug-reproducing tests are not (statistically significantly) different from other tests regarding LOC, number of assertions, and complexity. However, bug-reproducing tests contain slightly more try/except blocks and ``weak…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Testing and Debugging Techniques · Software Engineering Research · Teaching and Learning Programming
