Practical data monitoring in the internet-services domain
Nikhil Galagali

TL;DR
This paper introduces a reliable, accurate, and interpretable large-scale anomaly detection framework tailored for internet-services, addressing the challenge of false alarms in monitoring millions of metrics.
Contribution
The paper presents a novel anomaly detection framework that improves accuracy and interpretability for large-scale internet-service metrics monitoring.
Findings
Significantly more accurate than existing methods
Enables easy interpretation of detection models
Reduces false alarms in large-scale metric monitoring
Abstract
Large-scale monitoring, anomaly detection, and root cause analysis of metrics are essential requirements of the internet-services industry. To address the need to continuously monitor millions of metrics, many anomaly detection approaches are being used on a daily basis by large internet-based companies. However, in spite of the significant progress made to accurately and efficiently detect anomalies in metrics, the sheer scale of the number of metrics has meant there are still a large number of false alarms that need to be investigated. This paper presents a framework for reliable large-scale anomaly detection. It is significantly more accurate than existing approaches and allows for easy interpretation of models, thus enabling practical data monitoring in the internet-services domain.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Anomaly Detection Techniques and Applications · Network Security and Intrusion Detection
