Efficient Principal Subspace Projection of Streaming Data Through Fast Similarity Matching
Andrea Giovannucci, Victor Minden, Cengiz Pehlevan, Dmitri B., Chklovskii

TL;DR
This paper presents a fast, online algorithm for principal subspace estimation in streaming data, enabling efficient dimensionality reduction with minimal memory, and demonstrates its competitive performance on synthetic and real datasets.
Contribution
Introduces a computationally efficient similarity matching algorithm for online principal subspace estimation with a public test suite for benchmarking.
Findings
Our method performs among the best compared to existing algorithms.
It efficiently processes streaming data with minimal memory requirements.
The approach is suitable for real-time applications in big data scenarios.
Abstract
Big data problems frequently require processing datasets in a streaming fashion, either because all data are available at once but collectively are larger than available memory or because the data intrinsically arrive one data point at a time and must be processed online. Here, we introduce a computationally efficient version of similarity matching, a framework for online dimensionality reduction that incrementally estimates the top K-dimensional principal subspace of streamed data while keeping in memory only the last sample and the current iterate. To assess the performance of our approach, we construct and make public a test suite containing both a synthetic data generator and the infrastructure to test online dimensionality reduction algorithms on real datasets, as well as performant implementations of our algorithm and competing algorithms with similar aims. Among the algorithms…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTime Series Analysis and Forecasting · Anomaly Detection Techniques and Applications · Data Stream Mining Techniques
