Duet instrumentation: An Agentic Approach to Improving Sensitivity in Cloud Service Benchmarking
Sebastian Koch, Nils Japke, David Bermbach

TL;DR
This paper introduces duet instrumentation, a novel LLM-enabled benchmarking approach that analyzes code changes to detect performance regressions more sensitively in cloud applications.
Contribution
It proposes a new LLM-based method for automated, targeted performance assessment of code changes, improving sensitivity over traditional benchmarks.
Findings
Achieves 58% precision, 93% recall, 71% specificity in identifying code changes.
Detects performance regressions at up to 5x lower severity than traditional methods.
Maintains similar latency distributions while improving regression detection sensitivity.
Abstract
Continuous cloud service performance benchmarking is essential for detecting performance bugs early before deploying them to production. However, detecting performance regressions using application benchmarks, which usually treat the system under test as a black box, is challenging due to variable I/O calls or changing performance characteristics of the underlying cloud infrastructure. Microbenchmarks are often more sensitive and accurate, but also more time-consuming to implement and run. Further, they do not capture the performance of the integrated system as a whole. A comprehensive performance assessment therefore typically requires a combination of both approaches. To address the shortcomings of application benchmarks, we propose duet instrumentation, a novel benchmarking paradigm enabled by recent advancements in large language model (LLM) code understanding. The idea is to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
