Eadro: An End-to-End Troubleshooting Framework for Microservices on Multi-source Data
Cheryl Lee, Tianyi Yang, Zhuangbin Chen, Yuxin Su, and Michael R. Lyu

TL;DR
Eadro is an innovative end-to-end framework that integrates anomaly detection and root cause localization using multi-source data, significantly improving troubleshooting accuracy in large-scale microservice systems.
Contribution
It introduces the first integrated approach combining multi-source data and joint modeling for anomaly detection and localization in microservices.
Findings
Outperforms state-of-the-art methods by a large margin
Effectively leverages multi-source data for better troubleshooting
Demonstrates the importance of joint detection and localization
Abstract
The complexity and dynamism of microservices pose significant challenges to system reliability, and thereby, automated troubleshooting is crucial. Effective root cause localization after anomaly detection is crucial for ensuring the reliability of microservice systems. However, two significant issues rest in existing approaches: (1) Microservices generate traces, system logs, and key performance indicators (KPIs), but existing approaches usually consider traces only, failing to understand the system fully as traces cannot depict all anomalies; (2) Troubleshooting microservices generally contains two main phases, i.e., anomaly detection and root cause localization. Existing studies regard these two phases as independent, ignoring their close correlation. Even worse, inaccurate detection results can deeply affect localization effectiveness. To overcome these limitations, we propose Eadro,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Network Security and Intrusion Detection · Cloud Computing and Resource Management
