Benchmarking Blunders and Things That Go Bump in the Night
Neil J. Gunther

TL;DR
This paper discusses common pitfalls in benchmarking computer systems, illustrating how to avoid systematic mistakes through real-world examples and simple performance models to ensure accurate performance assessment.
Contribution
It introduces practical methods and models to identify and correct benchmarking errors, improving the reliability of performance measurements.
Findings
Benchmark flaws can be identified and corrected using simple performance models.
Misinterpretation of benchmark data often leads to incorrect conclusions.
Proper benchmarking practices prevent costly mistakes in system deployment.
Abstract
Benchmarking; by which I mean any computer system that is driven by a controlled workload, is the ultimate in performance testing and simulation. Aside from being a form of institutionalized cheating, it also offer countless opportunities for systematic mistakes in the way the workloads are applied and the resulting measurements interpreted. Right test, wrong conclusion is a ubiquitous mistake that happens because test engineers tend to treat data as divine. Such reverence is not only misplaced, it's also a sure ticket to production hell when the application finally goes live. I demonstrate how such mistakes can be avoided by means of two war stories that are real WOPRs. (a) How to resolve benchmark flaws over the psychic hotline and (b) How benchmarks can go flat with too much Java juice. In each case I present simple performance models and show how they can be applied to correctly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Mobile Agent-Based Network Management · Software Testing and Debugging Techniques
