Correlated Anomaly Detection from Large Streaming Data
Zheng Chen, Xinli Yu, Yuan Ling, Bo Song, Wei Quan, Xiaohua Hu, Erjia, Yan

TL;DR
This paper identifies limitations of principal score-based methods for correlated anomaly detection in large streaming data and introduces two randomized algorithms, rPS and gPS, that improve detection accuracy and scalability.
Contribution
The paper proposes two novel randomized algorithms, rPS and gPS, to overcome principal score degeneration and enhance correlated anomaly detection in large-scale streaming data.
Findings
High and balanced recall and accuracy in experiments
Significant improvements in computational efficiency and scalability
Effective detection across various correlation strengths
Abstract
Correlated anomaly detection (CAD) from streaming data is a type of group anomaly detection and an essential task in useful real-time data mining applications like botnet detection, financial event detection, industrial process monitor, etc. The primary approach for this type of detection in previous researches is based on principal score (PS) of divided batches or sliding windows by computing top eigenvalues of the correlation matrix, e.g. the Lanczos algorithm. However, this paper brings up the phenomenon of principal score degeneration for large data set, and then mathematically and practically prove current PS-based methods are likely to fail for CAD on large-scale streaming data even if the number of correlated anomalies grows with the data size at a reasonable rate; in reality, anomalies tend to be the minority of the data, and this issue can be more serious. We propose a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Network Security and Intrusion Detection · Data Stream Mining Techniques
