Empirical Derivations from an Evolving Test Suite

Jukka Ruohonen; Abhishek Tiwari

arXiv:2511.00915·cs.SE·May 8, 2026

Empirical Derivations from an Evolving Test Suite

Jukka Ruohonen, Abhishek Tiwari

PDF

TL;DR

This paper provides a longitudinal empirical analysis of the NetBSD operating system's evolving test suite, revealing growth patterns, stability of failures, and limited correlation with code changes over time.

Contribution

It offers novel insights into the long-term behavior and characteristics of an evolving, large-scale software test suite through empirical analysis.

Findings

01

Test suite grew to over ten thousand test cases.

02

Failure rates remained relatively stable over time.

03

Code churn and kernel modifications showed limited correlation with failures.

Abstract

The paper presents a longitudinal empirical analysis of the automated, continuous, and virtualization-based software test suite of the NetBSD operating system. The longitudinal period observed spans from the initial roll out of the test suite in the early 2010s to late 2025. According to the results, the test suite has grown continuously, currently covering over ten thousand individual test cases. Failed test cases exhibit overall stability, although there have been shorter periods marked with more frequent failures. A similar observation applies to build failures, failures of the test suite to complete, and installation failures, all of which are also captured by the NetBSD's testing framework. Finally, code churn and kernel modifications do not provide longitudinally consistent statistical explanations for the failures. Although some periods exhibit larger effects, including…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.