Practical data monitoring in the internet-services domain

Nikhil Galagali

arXiv:2203.08067·cs.LG·March 18, 2022

Practical data monitoring in the internet-services domain

Nikhil Galagali

PDF

Open Access 1 Repo

TL;DR

This paper introduces a reliable, accurate, and interpretable large-scale anomaly detection framework tailored for internet-services, addressing the challenge of false alarms in monitoring millions of metrics.

Contribution

The paper presents a novel anomaly detection framework that improves accuracy and interpretability for large-scale internet-service metrics monitoring.

Findings

01

Significantly more accurate than existing methods

02

Enables easy interpretation of detection models

03

Reduces false alarms in large-scale metric monitoring

Abstract

Large-scale monitoring, anomaly detection, and root cause analysis of metrics are essential requirements of the internet-services industry. To address the need to continuously monitor millions of metrics, many anomaly detection approaches are being used on a daily basis by large internet-based companies. However, in spite of the significant progress made to accurately and efficiently detect anomalies in metrics, the sheer scale of the number of metrics has meant there are still a large number of false alarms that need to be investigated. This paper presents a framework for reliable large-scale anomaly detection. It is significantly more accurate than existing approaches and allows for easy interpretation of models, thus enabling practical data monitoring in the internet-services domain.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nikhilgalagali/adservice
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware System Performance and Reliability · Anomaly Detection Techniques and Applications · Network Security and Intrusion Detection