On Principal Components Regression, Random Projections, and Column Subsampling
Martin Slawski

TL;DR
This paper compares principal components regression, random projections, and column sub-sampling for dimension reduction in linear regression, analyzing their statistical performance and computational efficiency through theory and experiments.
Contribution
It provides a theoretical analysis showing random projections can approximate PCR prediction error and compares different randomized dimension reduction methods.
Findings
Random projections with Johnson-Lindenstrauss property achieve near-PCR prediction error.
Column sub-sampling offers a cheaper alternative with comparable performance.
Numerical results demonstrate the trade-offs between methods on synthetic and real data.
Abstract
Principal Components Regression (PCR) is a traditional tool for dimension reduction in linear regression that has been both criticized and defended. One concern about PCR is that obtaining the leading principal components tends to be computationally demanding for large data sets. While random projections do not possess the optimality properties of the leading principal subspace, they are computationally appealing and hence have become increasingly popular in recent years. In this paper, we present an analysis showing that for random projections satisfying a Johnson-Lindenstrauss embedding property, the prediction error in subsequent regression is close to that of PCR, at the expense of requiring a slightly large number of random projections than principal components. Column sub-sampling constitutes an even cheaper way of randomized dimension reduction outside the class of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLinear Regression
