Duet instrumentation: An Agentic Approach to Improving Sensitivity in Cloud Service Benchmarking

Sebastian Koch; Nils Japke; David Bermbach

arXiv:2605.18397·cs.DC·May 19, 2026

Duet instrumentation: An Agentic Approach to Improving Sensitivity in Cloud Service Benchmarking

Sebastian Koch, Nils Japke, David Bermbach

PDF

TL;DR

This paper introduces duet instrumentation, a novel LLM-enabled benchmarking approach that analyzes code changes to detect performance regressions more sensitively in cloud applications.

Contribution

It proposes a new LLM-based method for automated, targeted performance assessment of code changes, improving sensitivity over traditional benchmarks.

Findings

01

Achieves 58% precision, 93% recall, 71% specificity in identifying code changes.

02

Detects performance regressions at up to 5x lower severity than traditional methods.

03

Maintains similar latency distributions while improving regression detection sensitivity.

Abstract

Continuous cloud service performance benchmarking is essential for detecting performance bugs early before deploying them to production. However, detecting performance regressions using application benchmarks, which usually treat the system under test as a black box, is challenging due to variable I/O calls or changing performance characteristics of the underlying cloud infrastructure. Microbenchmarks are often more sensitive and accurate, but also more time-consuming to implement and run. Further, they do not capture the performance of the integrated system as a whole. A comprehensive performance assessment therefore typically requires a combination of both approaches. To address the shortcomings of application benchmarks, we propose duet instrumentation, a novel benchmarking paradigm enabled by recent advancements in large language model (LLM) code understanding. The idea is to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.