Scale-Calibrated Median-of-Means for Robust Distributed Principal Component Analysis
Kisung You

TL;DR
This paper introduces a scale-calibrated median-of-means estimator for robust distributed PCA, effectively handling heterogeneity and providing reliable subspace estimation in large-scale data.
Contribution
It develops a novel median-of-means approach on the product manifold for distributed PCA, with explicit scale calibration and theoretical guarantees.
Findings
The estimator achieves fixed-node non-Gaussian limits and growing-node Gaussian limits.
Proposed calibration rules improve robustness and inference accuracy.
Simulations and RNA-seq data demonstrate effective adaptation to eigengap-driven uncertainty.
Abstract
Distributed principal component analysis (PCA) produces node-level estimates of both a mean vector and a principal subspace. Robustly aggregating these heterogeneous objects requires a relative scale between mean error and subspace error. We study a scale-calibrated median-of-means estimator for this problem using the product geometry of Euclidean space and the Grassmann manifold. A node-level PCA expansion shows that the mean component has the usual linear influence, whereas the subspace component is an eigengap-weighted covariance perturbation. We prove a local reduction showing that the proposed product-manifold median-of-means estimator is asymptotically equivalent to a scaled spatial median of node influence errors. This yields fixed-node non-Gaussian limits, growing-node Gaussian limits with finite-block bias, and an explicit scale-dependent covariance formula. We propose robust…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
