Distributed Principal Subspace Analysis for Partitioned Big Data: Algorithms, Analysis, and Implementation
Arpita Gang, Bingqing Xiang, and Waheed U. Bajwa

TL;DR
This paper introduces distributed algorithms for Principal Subspace Analysis suitable for partitioned big data, analyzes their convergence, and validates their effectiveness through extensive experiments on synthetic and real-world datasets.
Contribution
It proposes two novel distributed PSA/PCA algorithms for data partitioned across samples and features, with convergence analysis and practical implementation details.
Findings
Algorithms converge linearly to the true subspace.
Distributed implementation shows network topology impacts communication costs.
Straggler machines affect algorithm performance and robustness.
Abstract
Principal Subspace Analysis (PSA) -- and its sibling, Principal Component Analysis (PCA) -- is one of the most popular approaches for dimensionality reduction in signal processing and machine learning. But centralized PSA/PCA solutions are fast becoming irrelevant in the modern era of big data, in which the number of samples and/or the dimensionality of samples often exceed the storage and/or computational capabilities of individual machines. This has led to the study of distributed PSA/PCA solutions, in which the data are partitioned across multiple machines and an estimate of the principal subspace is obtained through collaboration among the machines. It is in this vein that this paper revisits the problem of distributed PSA/PCA under the general framework of an arbitrarily connected network of machines that lacks a central server. The main contributions of the paper in this regard…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
