Benchmarking Crimes: An Emerging Threat in Systems Security
Erik van der Kouwe, Dennis Andriesse, Herbert Bos, Cristiano, Giuffrida, Gernot Heiser

TL;DR
This paper identifies and analyzes widespread benchmarking mistakes in systems security research, highlighting their impact on reproducibility and validity, and emphasizes the need for improved benchmarking practices.
Contribution
It introduces 22 benchmarking crimes, conducts a reproducible survey of top-tier papers, and reveals the persistent prevalence of these issues over time.
Findings
Benchmarking crimes are common even in top-tier publications.
On average, papers commit five benchmarking crimes.
The prevalence of benchmarking crimes has remained constant over time.
Abstract
Properly benchmarking a system is a difficult and intricate task. Unfortunately, even a seemingly innocuous benchmarking mistake can compromise the guarantees provided by a given systems security defense and also put its reproducibility and comparability at risk. This threat is particularly insidious as it is generally not a result of malice and can easily go undetected by both authors and reviewers. Moreover, as modern defenses often trade off security for performance in an attempt to find an ideal design point in the performance-security space, the damage caused by benchmarking mistakes is increasingly worrisome. To analyze the magnitude of the phenomenon, we identify a set of 22 "benchmarking crimes" that threaten the validity of systems security evaluations and perform a survey of 50 defense papers published in top venues. To ensure the validity of our results, we perform the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation and Cyber Security · Advanced Malware Detection Techniques · Security and Verification in Computing
