Dimension Reduction for Large-Scale Federated Data: Statistical Rate and Asymptotic Inference
Shuting Shen, Junwei Lu, Xihong Lin

TL;DR
This paper introduces FADI, a scalable PCA method for federated data that handles high dimensionality and large sample sizes efficiently, with theoretical guarantees and practical validation.
Contribution
The paper proposes FADI, a novel distributed PCA algorithm that accelerates computation for large-scale federated data while maintaining statistical accuracy and enabling inference.
Findings
FADI achieves the same error rate as traditional PCA under certain conditions.
FADI accelerates computation through parallel and distributed processing.
Theoretical analysis reveals a phase-transition in asymptotic distribution as parameters vary.
Abstract
In light of the rapidly growing large-scale data in federated ecosystems, the traditional principal component analysis (PCA) is often not applicable due to privacy protection considerations and large computational burden. Algorithms were proposed to lower the computational cost, but few can handle both high dimensionality and massive sample size under distributed settings. In this paper, we propose the FAst DIstributed (FADI) PCA method for federated data when both the dimension and the sample size are ultra-large, by simultaneously performing parallel computing along and distributed computing along . Specifically, we utilize parallel copies of -dimensional fast sketches to divide the computing burden along and aggregate the results distributively along the split samples. We present a general framework applicable to multiple statistical problems, and establish…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification · Bioinformatics and Genomic Networks · Statistical Methods and Inference
