Empirical Derivations from an Evolving Test Suite
Jukka Ruohonen, Abhishek Tiwari

TL;DR
This paper provides a longitudinal empirical analysis of the NetBSD operating system's evolving test suite, revealing growth patterns, stability of failures, and limited correlation with code changes over time.
Contribution
It offers novel insights into the long-term behavior and characteristics of an evolving, large-scale software test suite through empirical analysis.
Findings
Test suite grew to over ten thousand test cases.
Failure rates remained relatively stable over time.
Code churn and kernel modifications showed limited correlation with failures.
Abstract
The paper presents a longitudinal empirical analysis of the automated, continuous, and virtualization-based software test suite of the NetBSD operating system. The longitudinal period observed spans from the initial roll out of the test suite in the early 2010s to late 2025. According to the results, the test suite has grown continuously, currently covering over ten thousand individual test cases. Failed test cases exhibit overall stability, although there have been shorter periods marked with more frequent failures. A similar observation applies to build failures, failures of the test suite to complete, and installation failures, all of which are also captured by the NetBSD's testing framework. Finally, code churn and kernel modifications do not provide longitudinally consistent statistical explanations for the failures. Although some periods exhibit larger effects, including…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
