Using Relative Lines of Code to Guide Automated Test Generation for Python
Josie Holmes, Iftekhar Ahmed, Caius Brindescu, Rahul Gopinath, He, Zhang, Alex Groce

TL;DR
This paper introduces a heuristic based on relative lines of code to improve automated test generation in Python, significantly enhancing coverage and fault detection with minimal overhead.
Contribution
It presents a novel LOC-based heuristic for guiding automated testing, especially effective in languages with high coverage data collection costs.
Findings
Improves branch and statement coverage by over 20-40%.
Enhances fault detection rates by 75-400%.
Easily combines with other testing approaches.
Abstract
Raw lines of code (LOC) is a metric that does not, at first glance, seem extremely useful for automated test generation. It is both highly language-dependent and not extremely meaningful, semantically, within a language: one coder can produce the same effect with many fewer lines than another. However, relative LOC, between components of the same project, turns out to be a highly useful metric for automated testing. In this paper, we make use of a heuristic based on LOC counts for tested functions to dramatically improve the effectiveness of automated test generation. This approach is particularly valuable in languages where collecting code coverage data to guide testing has a very high overhead.We apply the heuristic to property-based Python testing using the TSTL (Template Scripting Testing Language) tool. In our experiments, the simple LOC heuristic can improve branch and statement…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
