Distributed Estimation of Principal Eigenspaces
Jianqing Fan, Dong Wang, Kaizheng Wang, Ziwei Zhu

TL;DR
This paper introduces a distributed PCA algorithm that efficiently computes principal eigenspaces across multiple machines, analyzing its bias, variance, and convergence properties, and demonstrating its effectiveness compared to centralized PCA.
Contribution
It proposes a distributed PCA method with theoretical analysis of bias, variance, and convergence, applicable even in heterogeneous data settings, matching centralized PCA performance under certain conditions.
Findings
Distributed PCA is unbiased for symmetric innovation distributions.
Convergence rate depends on effective rank, eigen-gap, and number of machines.
Distributed PCA performs comparably to centralized PCA when the number of machines is moderate.
Abstract
Principal component analysis (PCA) is fundamental to statistical machine learning. It extracts latent principal factors that contribute to the most variation of the data. When data are stored across multiple machines, however, communication cost can prohibit the computation of PCA in a central location and distributed algorithms for PCA are thus needed. This paper proposes and studies a distributed PCA algorithm: each node machine computes the top eigenvectors and transmits them to the central server; the central server then aggregates the information from all the node machines and conducts a PCA based on the aggregated information. We investigate the bias and variance for the resulting distributed estimator of the top eigenvectors. In particular, we show that for distributions with symmetric innovation, the empirical top eigenspaces are unbiased and hence the distributed PCA is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRandom Matrices and Applications · Sparse and Compressive Sensing Techniques · Advanced Neuroimaging Techniques and Applications
