Practical Anomaly Detection over Multivariate Monitoring Metrics for Online Services
Jinyang Liu, Tianyi Yang, Zhuangbin Chen, Yuxin Su, Cong Feng, Zengyin, Yang, Michael R. Lyu

TL;DR
This paper introduces CMAnomaly, a novel anomaly detection framework for multivariate monitoring metrics that efficiently captures metric dependencies and historical patterns, significantly improving detection accuracy and speed in large-scale online services.
Contribution
The paper presents CMAnomaly, a new collaborative machine-based framework that effectively models metric interactions and temporal data with linear complexity, addressing industrial detection needs.
Findings
Achieves an average F1 score of 0.9494, outperforming baselines by 6.77% to 10.68%.
Runs 10 to 20 times faster than existing methods.
Successfully deployed in Huawei Cloud for large-scale online service monitoring.
Abstract
As modern software systems continue to grow in terms of complexity and volume, anomaly detection on multivariate monitoring metrics, which profile systems' health status, becomes more and more critical and challenging. In particular, the dependency between different metrics and their historical patterns plays a critical role in pursuing prompt and accurate anomaly detection. Existing approaches fall short of industrial needs for being unable to capture such information efficiently. To fill this significant gap, in this paper, we propose CMAnomaly, an anomaly detection framework on multivariate monitoring metrics based on collaborative machine. The proposed collaborative machine is a mechanism to capture the pairwise interactions along with feature and temporal dimensions with linear time complexity. Cost-effective models can then be employed to leverage both the dependency between…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNetwork Security and Intrusion Detection · Anomaly Detection Techniques and Applications · Software System Performance and Reliability
Methodstravel james
