Improved Distributed Principal Component Analysis
Maria-Florina Balcan, Vandana Kanchanapally, Yingyu Liang, David, Woodruff

TL;DR
This paper presents new algorithms for distributed PCA that significantly reduce communication and computational costs, enabling faster and more efficient analysis of large distributed datasets with minimal loss in accuracy.
Contribution
The authors develop improved distributed PCA algorithms with better communication and computational efficiency, and introduce techniques for high success probability subspace embeddings.
Findings
Order of magnitude speedup in empirical tests
Negligible degradation in solution quality
Enhanced algorithms for k-means clustering
Abstract
We study the distributed computing setting in which there are multiple servers, each holding a set of points, who wish to compute functions on the union of their point sets. A key task in this setting is Principal Component Analysis (PCA), in which the servers would like to compute a low dimensional subspace capturing as much of the variance of the union of their point sets as possible. Given a procedure for approximate PCA, one can use it to approximately solve -error fitting problems such as -means clustering and subspace clustering. The essential properties of an approximate distributed PCA algorithm are its communication cost and computational efficiency for a given desired accuracy in downstream applications. We give new algorithms and analyses for distributed PCA which lead to improved communication and computational costs for -means clustering and related problems.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Image Retrieval and Classification Techniques · Graph Theory and Algorithms
MethodsPrincipal Components Analysis
