Approximating the Top Eigenvector in Random Order Streams
Praneeth Kacham, David P. Woodruff

TL;DR
This paper develops a randomized streaming algorithm for approximating the top eigenvector of a matrix when rows arrive in a random order, with guarantees depending on the spectral gap and the number of heavy rows, improving previous bounds.
Contribution
It introduces a space-efficient algorithm for top eigenvector approximation in random order streams that leverages the concept of heavy rows and provides matching lower bounds.
Findings
Achieves correlation close to 1 with the top eigenvector using space proportional to heavy rows and spectral gap.
Provides lower bounds showing the necessity of space depending on heavy rows for high accuracy.
Improves spectral gap requirements for random order streams compared to prior work.
Abstract
When rows of an matrix are given in a stream, we study algorithms for approximating the top eigenvector of the matrix (equivalently, the top right singular vector of ). We consider worst case inputs but assume that the rows are presented to the streaming algorithm in a uniformly random order. We show that when the gap parameter , then there is a randomized algorithm that uses bits of space and outputs a unit vector that has a correlation with the top eigenvector . Here denotes the number of \emph{heavy rows} in the matrix, defined as the rows with Euclidean norm at least . We also provide a lower bound showing that any algorithm using bits of space can obtain at most $1…
Peer Reviews
Decision·NeurIPS 2024 spotlight
Originality: First paper to study the problem of approximating the top eigen-vector in the random-order arrival setting. Quality and Clarity: Overall well-written paper. Significance: The paper can certainly do a better job of motivating the problem. This paper is primarily a theory paper but since the problem studied is closely related to PCA, which is obviously very important in practical applications, it is easy to imagine scenarios where the algorithm described in this paper can be impleme
Paper needs to do a better job motivating the problem. This is a submission to NeurIPS, not COLT. Clearly, the paper has interesting technical contributions but I think this might be a major weakness for the paper. Some suggestions for the same: It would be nice for the introduction of the paper to have at least a few lines motivating the case for high-dimensional d and why the distinction between $d^2$ and $d$ is something which matters in real-world applications. For example, see the paper “M
Code & Models
Videos
Taxonomy
TopicsData Stream Mining Techniques · Advanced Data Compression Techniques · Neural Networks and Applications
