Memory Limited, Streaming PCA
Ioannis Mitliagkas, Constantine Caramanis, Prateek Jain

TL;DR
This paper introduces a streaming PCA algorithm that operates with minimal memory and achieves near-optimal sample complexity, enabling efficient high-dimensional data analysis in resource-constrained environments.
Contribution
The paper presents the first algorithm that combines $O(kp)$ memory usage with $O(p \, \log p)$ sample complexity for streaming PCA in high dimensions.
Findings
Algorithm successfully recovers the spike with minimal memory.
Theoretical guarantees are provided for the spiked covariance model.
Simulations demonstrate effectiveness on more general data models.
Abstract
We consider streaming, one-pass principal component analysis (PCA), in the high-dimensional regime, with limited memory. Here, -dimensional samples are presented sequentially, and the goal is to produce the -dimensional subspace that best approximates these points. Standard algorithms require memory; meanwhile no algorithm can do better than memory, since this is what the output itself requires. Memory (or storage) complexity is most meaningful when understood in the context of computational and sample complexity. Sample complexity for high-dimensional PCA is typically studied in the setting of the {\em spiked covariance model}, where -dimensional points are generated from a population covariance equal to the identity (white noise) plus a low-dimensional perturbation (the spike) which is the signal to be recovered. It is now well-understood that the spike can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Memory Limited, Streaming PCA· youtube
Taxonomy
TopicsRandom Matrices and Applications · Sparse and Compressive Sensing Techniques · Stochastic Gradient Optimization Techniques
MethodsPrincipal Components Analysis
