TL;DR
This paper introduces a neural network-based distributed PCA algorithm that efficiently estimates eigenvectors across multiple machines, guarantees linear convergence, and reduces communication overhead in distributed data settings.
Contribution
It presents the Distributed Sanger's Algorithm, a novel one-time-scale neural network method for distributed PCA with proven linear convergence guarantees.
Findings
Converges linearly to a neighborhood of the true eigenvectors.
Reduces communication overhead compared to existing methods.
Demonstrates effectiveness through numerical experiments.
Abstract
Principal Component Analysis (PCA) is the workhorse tool for dimensionality reduction in this era of big data. While often overlooked, the purpose of PCA is not only to reduce data dimensionality, but also to yield features that are uncorrelated. Furthermore, the ever-increasing volume of data in the modern world often requires storage of data samples across multiple machines, which precludes the use of centralized PCA algorithms. This paper focuses on the dual objective of PCA, namely, dimensionality reduction and decorrelation of features, but in a distributed setting. This requires estimating the eigenvectors of the data covariance matrix, as opposed to only estimating the subspace spanned by the eigenvectors, when data is distributed across a network of machines. Although a few distributed solutions to the PCA problem have been proposed recently, convergence guarantees and/or…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsPrincipal Components Analysis
