BARO: Robust Root Cause Analysis for Microservices via Multivariate Bayesian Online Change Point Detection
Luan Pham, Huong Ha, Hongyu Zhang

TL;DR
BARO is an integrated approach for microservice failure troubleshooting that combines multivariate Bayesian change point detection with a robust statistical test to improve anomaly detection and root cause localization accuracy.
Contribution
It introduces a novel end-to-end framework that enhances root cause analysis by effectively modeling metric dependencies and reducing sensitivity to anomaly detection errors.
Findings
Outperforms existing methods in anomaly detection accuracy.
Achieves more precise root cause localization.
Demonstrates robustness across multiple benchmark systems.
Abstract
Detecting failures and identifying their root causes promptly and accurately is crucial for ensuring the availability of microservice systems. A typical failure troubleshooting pipeline for microservices consists of two phases: anomaly detection and root cause analysis. While various existing works on root cause analysis require accurate anomaly detection, there is no guarantee of accurate estimation with anomaly detection techniques. Inaccurate anomaly detection results can significantly affect the root cause localization results. To address this challenge, we propose BARO, an end-to-end approach that integrates anomaly detection and root cause analysis for effectively troubleshooting failures in microservice systems. BARO leverages the Multivariate Bayesian Online Change Point Detection technique to model the dependency within multivariate time-series metrics data, enabling it to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Network Security and Intrusion Detection · Data Stream Mining Techniques
