TORAI: Multi-source Root Cause Analysis for Blind Spots in Microservice Service Call Graph
Luan Pham, Huong Ha, Xiuzhen Zhang, Hongyu Zhang

TL;DR
TORAI is an unsupervised root cause analysis method for microservice systems that effectively identifies causes without relying on service call graphs, especially in the presence of blind spots.
Contribution
It introduces a novel approach that uses multi-source telemetry data and clustering to diagnose root causes without needing a constructed call graph.
Findings
TORAI outperforms state-of-the-art baselines in benchmark tests.
It accurately pinpoints root causes in top-3 recommendations during real-world failures.
The method is effective even with services lacking trace data.
Abstract
Existing multi-source root cause analysis (RCA) methods for microservice systems assume all services have traces to construct a service call graph. However, this assumption is not practical as microservice systems evolve rapidly and may contain blackbox services without traces, such as compiled software or unsupported services. We refer to these services as blind spots. In the presence of blind spots, the performance of existing multi-source RCA methods may be affected, as they only diagnose visible services on the call graph. To overcome this limitation, we propose TORAI, a novel unsupervised approach that effectively pinpoints fine-grained root causes without relying on the service call graph. Instead, TORAI first measures anomaly severity using available multi-source telemetry data. It then performs clustering to group services based on their severity symptoms and conducts causal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
