Distributed Principal Subspace Analysis for Partitioned Big Data:   Algorithms, Analysis, and Implementation

Arpita Gang; Bingqing Xiang; and Waheed U. Bajwa

arXiv:2103.06406·cs.LG·November 25, 2021

Distributed Principal Subspace Analysis for Partitioned Big Data: Algorithms, Analysis, and Implementation

Arpita Gang, Bingqing Xiang, and Waheed U. Bajwa

PDF

TL;DR

This paper introduces distributed algorithms for Principal Subspace Analysis suitable for partitioned big data, analyzes their convergence, and validates their effectiveness through extensive experiments on synthetic and real-world datasets.

Contribution

It proposes two novel distributed PSA/PCA algorithms for data partitioned across samples and features, with convergence analysis and practical implementation details.

Findings

01

Algorithms converge linearly to the true subspace.

02

Distributed implementation shows network topology impacts communication costs.

03

Straggler machines affect algorithm performance and robustness.

Abstract

Principal Subspace Analysis (PSA) -- and its sibling, Principal Component Analysis (PCA) -- is one of the most popular approaches for dimensionality reduction in signal processing and machine learning. But centralized PSA/PCA solutions are fast becoming irrelevant in the modern era of big data, in which the number of samples and/or the dimensionality of samples often exceed the storage and/or computational capabilities of individual machines. This has led to the study of distributed PSA/PCA solutions, in which the data are partitioned across multiple machines and an estimate of the principal subspace is obtained through collaboration among the machines. It is in this vein that this paper revisits the problem of distributed PSA/PCA under the general framework of an arbitrarily connected network of machines that lacks a central server. The main contributions of the paper in this regard…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.