Quantifying Performance Changes with Effect Size Confidence Intervals
Tomas Kalibera, Richard Jones

TL;DR
This paper introduces a statistical framework for quantifying uncertainty in performance measurements, improving the rigor and interpretability of experimental results in systems research.
Contribution
It presents a novel statistical model that accounts for non-determinism and provides confidence intervals for performance ratios, enhancing reproducibility and validity.
Findings
Provides a method to compute confidence intervals for execution time ratios
Addresses non-determinism in performance measurements
Enables clearer, more reliable performance comparisons
Abstract
Measuring performance & quantifying a performance change are core evaluation techniques in programming language and systems research. Of 122 recent scientific papers, as many as 65 included experimental evaluation that quantified a performance change using a ratio of execution times. Few of these papers evaluated their results with the level of rigour that has come to be expected in other experimental sciences. The uncertainty of measured results was largely ignored. Scarcely any of the papers mentioned uncertainty in the ratio of the mean execution times, and most did not even mention uncertainty in the two means themselves. Most of the papers failed to address the non-deterministic execution of computer programs (caused by factors such as memory placement, for example), and none addressed non-deterministic compilation. It turns out that the statistical methods presented in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Software Engineering Research · Parallel Computing and Optimization Techniques
