Online Multi-modal Root Cause Identification in Microservice Systems
Lecheng Zheng, Zhengzhang Chen, Haifeng Chen

TL;DR
The paper presents OCEAN, an innovative online multi-modal causal learning approach for root cause analysis in microservice systems, integrating neural networks and attention mechanisms for real-time failure localization.
Contribution
Introduces OCEAN, a novel online multi-modal causal structure learning method combining neural networks, attention, and graph fusion for real-time root cause analysis in complex systems.
Findings
Effective in real-world datasets
Outperforms existing online RCA methods
Efficient in computational performance
Abstract
Root Cause Analysis (RCA) is essential for pinpointing the root causes of failures in microservice systems. Traditional data-driven RCA methods are typically limited to offline applications due to high computational demands, and existing online RCA methods handle only single-modal data, overlooking complex interactions in multi-modal systems. In this paper, we introduce OCEAN, a novel online multi-modal causal structure learning method for root cause localization. OCEAN employs a dilated convolutional neural network to capture long-term temporal dependencies and graph neural networks to learn causal relationships among system entities and key performance indicators. We further design a multi-factor attention mechanism to analyze and reassess the relationships among different metrics and log indicators/attributes for enhanced online causal graph learning. Additionally, a contrastive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRisk and Safety Analysis
MethodsSoftmax · Attention Is All You Need
