Online Data Thinning via Multi-Subspace Tracking

Xin Jiang; Rebecca Willett

arXiv:1609.03544·stat.ML·September 13, 2016

Online Data Thinning via Multi-Subspace Tracking

Xin Jiang, Rebecca Willett

PDF

Open Access

TL;DR

This paper introduces an online data thinning method that uses dynamic low-rank Gaussian mixture models to efficiently identify and preserve salient or anomalous elements in large-scale streaming high-dimensional data in real time.

Contribution

It presents a novel online anomaly detection approach based on low-rank Gaussian mixtures that adapt to dynamic environments and handle missing data, enabling scalable real-time data thinning.

Findings

01

Effective in real-time anomaly detection

02

Robust to missing data and high dimensionality

03

Scalable and efficient for large-scale streaming data

Abstract

In an era of ubiquitous large-scale streaming data, the availability of data far exceeds the capacity of expert human analysts. In many settings, such data is either discarded or stored unprocessed in datacenters. This paper proposes a method of online data thinning, in which large-scale streaming datasets are winnowed to preserve unique, anomalous, or salient elements for timely expert analysis. At the heart of this proposed approach is an online anomaly detection method based on dynamic, low-rank Gaussian mixture models. Specifically, the high-dimensional covariances matrices associated with the Gaussian components are associated with low-rank models. According to this model, most observations lie near a union of subspaces. The low-rank modeling mitigates the curse of dimensionality associated with anomaly detection for high-dimensional data, and recent advances in subspace clustering…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications · Data-Driven Disease Surveillance · COVID-19 epidemiological studies