CMMD: Cross-Metric Multi-Dimensional Root Cause Analysis
Shifu Yan, Caihua Shan, Wenyi Yang, Bixiong Xu, Dongsheng Li, Lili, Qiu, Jie Tong, Qi Zhang

TL;DR
This paper introduces CMMD, a novel cross-metric multi-dimensional root cause analysis method that leverages graph neural networks and genetic algorithms to improve accuracy in identifying causes of KPI anomalies in large-scale online services.
Contribution
The paper presents a new RCA approach that models complex metric relationships with GNNs and localizes root causes using genetic algorithms, addressing limitations of previous methods.
Findings
CMMD outperforms baseline methods in synthetic and real datasets.
CMMD effectively models complex metric relationships.
CMMD is deployed as an online service in Microsoft Azure.
Abstract
In large-scale online services, crucial metrics, a.k.a., key performance indicators (KPIs), are monitored periodically to check their running statuses. Generally, KPIs are aggregated along multiple dimensions and derived by complex calculations among fundamental metrics from the raw data. Once abnormal KPI values are observed, root cause analysis (RCA) can be applied to identify the reasons for anomalies, so that we can troubleshoot quickly. Recently, several automatic RCA techniques were proposed to localize the related dimensions (or a combination of dimensions) to explain the anomalies. However, their analyses are limited to the data on the abnormal metric and ignore the data of other metrics which may be also related to the anomalies, leading to imprecise or even incorrect root causes. To this end, we propose a cross-metric multi-dimensional root cause analysis method, named CMMD,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Methodstravel james · Graph Neural Network
