Root Cause Analysis for Microservice System based on Causal Inference: How Far Are We?
Luan Pham, Huong Ha, Hongyu Zhang

TL;DR
This paper evaluates nine causal inference-based methods for root cause analysis in microservice systems, revealing that no single method excels universally and highlighting the gap between synthetic and real system performance.
Contribution
It provides a comprehensive empirical evaluation of existing causal inference methods for microservice root cause analysis, identifying their limitations and areas for future improvement.
Findings
No method outperforms others in all scenarios
Performance varies significantly across datasets and parameters
Synthetic dataset results may not reflect real system performance
Abstract
Microservice architecture has become a popular architecture adopted by many cloud applications. However, identifying the root cause of a failure in microservice systems is still a challenging and time-consuming task. In recent years, researchers have introduced various causal inference-based root cause analysis methods to assist engineers in identifying the root causes. To gain a better understanding of the current status of causal inference-based root cause analysis techniques for microservice systems, we conduct a comprehensive evaluation of nine causal discovery methods and twenty-one root cause analysis methods. Our evaluation aims to understand both the effectiveness and efficiency of causal inference-based root cause analysis methods, as well as other factors that affect their performance. Our experimental results and analyses indicate that no method stands out in all situations;…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability
