Fast, Exact Bootstrap Principal Component Analysis for p>1 million
Aaron Fisher, Brian Caffo, Brian Schwartz, Vadim Zipunnikov

TL;DR
This paper introduces a fast, exact bootstrap PCA method for high-dimensional data where p exceeds 1 million, enabling efficient computation of principal components and their uncertainties.
Contribution
The authors develop a novel approach that leverages the shared subspace of bootstrap samples to compute PCA statistics efficiently in extremely high-dimensional settings.
Findings
Enables calculation of bootstrap PCA in high dimensions within hours instead of days.
Provides standard errors for principal components on large MRI datasets using standard laptops.
Reduces computational complexity by representing bootstrap components in a low-dimensional subspace.
Abstract
Many have suggested a bootstrap procedure for estimating the sampling variability of principal component analysis (PCA) results. However, when the number of measurements per subject () is much larger than the number of subjects (), the challenge of calculating and storing the leading principal components from each bootstrap sample can be computationally infeasible. To address this, we outline methods for fast, exact calculation of bootstrap principal components, eigenvalues, and scores. Our methods leverage the fact that all bootstrap samples occupy the same -dimensional subspace as the original sample. As a result, all bootstrap principal components are limited to the same -dimensional subspace and can be efficiently represented by their low dimensional coordinates in that subspace. Several uncertainty metrics can be computed solely based on the bootstrap distribution of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFunctional Brain Connectivity Studies · Statistical Methods and Inference · Health, Environment, Cognitive Aging
