On Robustness of Principal Component Regression
Anish Agarwal, Devavrat Shah, Dennis Shen, Dogyoon Song

TL;DR
This paper demonstrates that principal component regression (PCR) is robust to noisy, missing, and mixed data types, providing finite-sample analysis and connecting it to robust synthetic control methods, with implications for privacy and matrix estimation.
Contribution
It establishes PCR's robustness without modifications, links it to robust synthetic control, and advances HSVT analysis with stronger norm guarantees.
Findings
PCR is equivalent to linear regression after HSVT preprocessing.
PCR's robustness allows it to handle noisy and transformed covariates.
Finite-sample analysis of robust synthetic control is provided.
Abstract
Principal component regression (PCR) is a simple, but powerful and ubiquitously utilized method. Its effectiveness is well established when the covariates exhibit low-rank structure. However, its ability to handle settings with noisy, missing, and mixed-valued, i.e., discrete and continuous, covariates is not understood and remains an important open challenge. As the main contribution of this work we establish the robustness of PCR, without any change, in this respect and provide meaningful finite-sample analysis. To do so, we establish that PCR is equivalent to performing linear regression after pre-processing the covariate matrix via hard singular value thresholding (HSVT). As a result, in the context of counterfactual analysis using observational data, we show PCR is equivalent to the recently proposed robust variant of the synthetic control method, known as robust synthetic control…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Sparse and Compressive Sensing Techniques · Distributed Sensor Networks and Detection Algorithms
MethodsLinear Regression
