BayesPerf: Minimizing Performance Monitoring Errors Using Bayesian Statistics
Subho S. Banerjee, Saurabh Jha, Zbigniew T. Kalbarczyk, Ravishankar K., Iyer

TL;DR
BayesPerf uses Bayesian modeling to accurately quantify and reduce errors in hardware performance counter measurements, enabling more reliable system monitoring and decision-making.
Contribution
The paper introduces BayesPerf, a Bayesian framework and hardware accelerator that significantly improves the accuracy of HPC measurements by modeling uncertainty and microarchitectural relationships.
Findings
Reduces measurement error from 40.1% to 7.6%.
Provides low-latency, low-power inference for x86 and ppc64 CPUs.
Demonstrates practical benefits in real-time system scheduling.
Abstract
Hardware performance counters (HPCs) that measure low-level architectural and microarchitectural events provide dynamic contextual information about the state of the system. However, HPC measurements are error-prone due to non determinism (e.g., undercounting due to event multiplexing, or OS interrupt-handling behaviors). In this paper, we present BayesPerf, a system for quantifying uncertainty in HPC measurements by using a domain-driven Bayesian model that captures microarchitectural relationships between HPCs to jointly infer their values as probability distributions. We provide the design and implementation of an accelerator that allows for low-latency and low-power inference of the BayesPerf model for x86 and ppc64 CPUs. BayesPerf reduces the average error in HPC measurements from 40.1% to 7.6% when events are being multiplexed. The value of BayesPerf in real-time decision-making…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
