TL;DR
This paper investigates whether optimized microbenchmark suites can reliably detect application performance changes, similar to full application benchmarks, to enable quicker performance regression detection in software development.
Contribution
It provides empirical evidence that microbenchmark suites can serve as proxies for application benchmarks in detecting performance changes, with trade-offs in false positives.
Findings
Microbenchmark suites can detect most performance changes.
Optimized microbenchmarks are faster but may generate false positives.
Application benchmarks and microbenchmarks do not always detect the same issues.
Abstract
Software performance changes are costly and often hard to detect pre-release. Similar to software testing frameworks, either application benchmarks or microbenchmarks can be integrated into quality assurance pipelines to detect performance changes before releasing a new application version. Unfortunately, extensive benchmarking studies usually take several hours which is problematic when examining dozens of daily code changes in detail; hence, trade-offs have to be made. Optimized microbenchmark suites, which only include a small subset of the full suite, are a potential solution for this problem, given that they still reliably detect the majority of the application performance changes such as an increased request latency. It is, however, unclear whether microbenchmarks and application benchmarks detect the same performance problems and one can be a proxy for the other. In this paper,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
