Online Data Thinning via Multi-Subspace Tracking
Xin Jiang, Rebecca Willett

TL;DR
This paper introduces an online data thinning method that uses dynamic low-rank Gaussian mixture models to efficiently identify and preserve salient or anomalous elements in large-scale streaming high-dimensional data in real time.
Contribution
It presents a novel online anomaly detection approach based on low-rank Gaussian mixtures that adapt to dynamic environments and handle missing data, enabling scalable real-time data thinning.
Findings
Effective in real-time anomaly detection
Robust to missing data and high dimensionality
Scalable and efficient for large-scale streaming data
Abstract
In an era of ubiquitous large-scale streaming data, the availability of data far exceeds the capacity of expert human analysts. In many settings, such data is either discarded or stored unprocessed in datacenters. This paper proposes a method of online data thinning, in which large-scale streaming datasets are winnowed to preserve unique, anomalous, or salient elements for timely expert analysis. At the heart of this proposed approach is an online anomaly detection method based on dynamic, low-rank Gaussian mixture models. Specifically, the high-dimensional covariances matrices associated with the Gaussian components are associated with low-rank models. According to this model, most observations lie near a union of subspaces. The low-rank modeling mitigates the curse of dimensionality associated with anomaly detection for high-dimensional data, and recent advances in subspace clustering…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Data-Driven Disease Surveillance · COVID-19 epidemiological studies
