Approximating the Top Eigenvector in Random Order Streams

Praneeth Kacham; David P. Woodruff

arXiv:2412.11963·cs.DS·December 17, 2024

Approximating the Top Eigenvector in Random Order Streams

Praneeth Kacham, David P. Woodruff

PDF

Open Access 3 Models 1 Video 1 Reviews

TL;DR

This paper develops a randomized streaming algorithm for approximating the top eigenvector of a matrix when rows arrive in a random order, with guarantees depending on the spectral gap and the number of heavy rows, improving previous bounds.

Contribution

It introduces a space-efficient algorithm for top eigenvector approximation in random order streams that leverages the concept of heavy rows and provides matching lower bounds.

Findings

01

Achieves correlation close to 1 with the top eigenvector using space proportional to heavy rows and spectral gap.

02

Provides lower bounds showing the necessity of space depending on heavy rows for high accuracy.

03

Improves spectral gap requirements for random order streams compared to prior work.

Abstract

When rows of an $n \times d$ matrix $A$ are given in a stream, we study algorithms for approximating the top eigenvector of the matrix $A^{T} A$ (equivalently, the top right singular vector of $A$ ). We consider worst case inputs $A$ but assume that the rows are presented to the streaming algorithm in a uniformly random order. We show that when the gap parameter $R = σ_{1} (A)^{2} / σ_{2} (A)^{2} = Ω (1)$ , then there is a randomized algorithm that uses $O (h \cdot d \cdot polylog (d))$ bits of space and outputs a unit vector $v$ that has a correlation $1 - O (1/ R)$ with the top eigenvector $v_{1}$ . Here $h$ denotes the number of \emph{heavy rows} in the matrix, defined as the rows with Euclidean norm at least $∥ A ∥_{F} / d \cdot polylog (d)$ . We also provide a lower bound showing that any algorithm using $O (h d / R)$ bits of space can obtain at most $1…

Peer Reviews

Decision·NeurIPS 2024 spotlight

Reviewer 01Rating 7Confidence 3

Strengths

Originality: First paper to study the problem of approximating the top eigen-vector in the random-order arrival setting. Quality and Clarity: Overall well-written paper. Significance: The paper can certainly do a better job of motivating the problem. This paper is primarily a theory paper but since the problem studied is closely related to PCA, which is obviously very important in practical applications, it is easy to imagine scenarios where the algorithm described in this paper can be impleme

Weaknesses

Paper needs to do a better job motivating the problem. This is a submission to NeurIPS, not COLT. Clearly, the paper has interesting technical contributions but I think this might be a major weakness for the paper. Some suggestions for the same: It would be nice for the introduction of the paper to have at least a few lines motivating the case for high-dimensional d and why the distinction between $d^2$ and $d$ is something which matters in real-world applications. For example, see the paper “M

Code & Models

Models

Videos

Approximating the Top Eigenvector in Random Order Streams· slideslive

Taxonomy

TopicsData Stream Mining Techniques · Advanced Data Compression Techniques · Neural Networks and Applications