TL;DR
This paper introduces SGMRD, a novel method for real-time subspace search in high-dimensional data streams, improving pattern detection and outlier identification by leveraging dependency estimators and bandit-based monitoring.
Contribution
It generalizes subspace search to streaming data, combining dependency estimation and bandit theory for efficient, effective pattern monitoring in high-dimensional, evolving data streams.
Findings
SGMRD outperforms existing methods significantly.
It enhances downstream tasks like outlier detection.
Demonstrated effectiveness on synthetic and real-world data.
Abstract
In the real world, data streams are ubiquitous -- think of network traffic or sensor data. Mining patterns, e.g., outliers or clusters, from such data must take place in real time. This is challenging because (1) streams often have high dimensionality, and (2) the data characteristics may change over time. Existing approaches tend to focus on only one aspect, either high dimensionality or the specifics of the streaming setting. For static data, a common approach to deal with high dimensionality -- known as subspace search -- extracts low-dimensional, `interesting' projections (subspaces), in which patterns are easier to find. In this paper, we address both Challenge (1) and (2) by generalising subspace search to data streams. Our approach, Streaming Greedy Maximum Random Deviation (SGMRD), monitors interesting subspaces in high-dimensional data streams. It leverages novel multivariate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
