Subspace Recovery from Heterogeneous Data with Non-isotropic Noise
John Duchi, Vitaly Feldman, Lunjia Hu, Kunal Talwar

TL;DR
This paper presents a new estimator for subspace recovery in heterogeneous data with non-isotropic noise, leveraging multiple data points per user to overcome limitations of previous methods and achieve near-optimal error bounds.
Contribution
It introduces an efficient estimator for PCA with non-spherical, user-dependent noise using multiple data points per user, and establishes matching upper and lower error bounds.
Findings
Estimator achieves near-optimal error bounds.
Requires at least two data points per user.
Handles non-isotropic, user-dependent noise effectively.
Abstract
Recovering linear subspaces from data is a fundamental and important task in statistics and machine learning. Motivated by heterogeneity in Federated Learning settings, we study a basic formulation of this problem: the principal component analysis (PCA), with a focus on dealing with irregular noise. Our data come from users with user contributing data samples from a -dimensional distribution with mean . Our goal is to recover the linear subspace shared by using the data points from all users, where every data point from user is formed by adding an independent mean-zero noise vector to . If we only have one data point from every user, subspace recovery is information-theoretically impossible when the covariance matrices of the noise vectors can be non-spherical, necessitating additional restrictive assumptions in previous work. We avoid…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsStatistical Methods and Inference · Sparse and Compressive Sensing Techniques · Distributed Sensor Networks and Detection Algorithms
MethodsLinear Regression
