Memory Limited, Streaming PCA

Ioannis Mitliagkas; Constantine Caramanis; Prateek Jain

arXiv:1307.0032·stat.ML·July 2, 2013·66 cites

Memory Limited, Streaming PCA

Ioannis Mitliagkas, Constantine Caramanis, Prateek Jain

PDF

Open Access 1 Video

TL;DR

This paper introduces a streaming PCA algorithm that operates with minimal memory and achieves near-optimal sample complexity, enabling efficient high-dimensional data analysis in resource-constrained environments.

Contribution

The paper presents the first algorithm that combines $O(kp)$ memory usage with $O(p \, \log p)$ sample complexity for streaming PCA in high dimensions.

Findings

01

Algorithm successfully recovers the spike with minimal memory.

02

Theoretical guarantees are provided for the spiked covariance model.

03

Simulations demonstrate effectiveness on more general data models.

Abstract

We consider streaming, one-pass principal component analysis (PCA), in the high-dimensional regime, with limited memory. Here, $p$ -dimensional samples are presented sequentially, and the goal is to produce the $k$ -dimensional subspace that best approximates these points. Standard algorithms require $O (p^{2})$ memory; meanwhile no algorithm can do better than $O (k p)$ memory, since this is what the output itself requires. Memory (or storage) complexity is most meaningful when understood in the context of computational and sample complexity. Sample complexity for high-dimensional PCA is typically studied in the setting of the {\em spiked covariance model}, where $p$ -dimensional points are generated from a population covariance equal to the identity (white noise) plus a low-dimensional perturbation (the spike) which is the signal to be recovered. It is now well-understood that the spike can…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Memory Limited, Streaming PCA· youtube

Taxonomy

TopicsRandom Matrices and Applications · Sparse and Compressive Sensing Techniques · Stochastic Gradient Optimization Techniques

MethodsPrincipal Components Analysis