Root Cause Analysis for Microservice Systems via Cascaded Conditional Learning with Hypergraphs
Shuaiyu Xie, Hanbin He, Jian Wang, Bing Li

TL;DR
This paper introduces CCLH, a novel framework for root cause analysis in microservice systems that models causal dependencies and group influences using cascaded conditional learning and hypergraphs, improving diagnostic accuracy.
Contribution
The paper proposes a new cascaded conditional learning framework with hypergraph modeling to better capture causal and group influences in root cause analysis for microservices.
Findings
CCLH outperforms existing methods in RCL and FTI tasks.
Hypergraph modeling effectively captures group influences.
Cascaded conditional learning improves inter-task collaboration.
Abstract
Root cause analysis in microservice systems typically involves two core tasks: root cause localization (RCL) and failure type identification (FTI). Despite substantial research efforts, conventional diagnostic approaches still face two key challenges. First, these methods predominantly adopt a joint learning paradigm for RCL and FTI to exploit shared information and reduce training time. However, this simplistic integration neglects the causal dependencies between tasks, thereby impeding inter-task collaboration and information transfer. Second, these existing methods primarily focus on point-to-point relationships between instances, overlooking the group nature of inter-instance influences induced by deployment configurations and load balancing. To overcome these limitations, we propose CCLH, a novel root cause analysis framework that orchestrates diagnostic tasks based on cascaded…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Cloud Computing and Resource Management · Software-Defined Networks and 5G
