Factor models and variable selection in high-dimensional regression analysis
Alois Kneip, Pascal Sarda

TL;DR
This paper introduces a factor-based approach to high-dimensional linear regression, combining model selection with principal component analysis to better capture the influence of common factors and specific variables.
Contribution
It proposes incorporating principal components into regression models and provides theoretical and finite sample analysis for these estimators in high-dimensional settings.
Findings
Principal components improve model performance in high-dimensional regression.
Finite sample inequalities for component estimates are established.
Simulation studies demonstrate the effectiveness of the proposed approach.
Abstract
The paper considers linear regression problems where the number of predictor variables is possibly larger than the sample size. The basic motivation of the study is to combine the points of view of model selection and functional regression by using a factor approach: it is assumed that the predictor vector can be decomposed into a sum of two uncorrelated random components reflecting common factors and specific variabilities of the explanatory variables. It is shown that the traditional assumption of a sparse vector of parameters is restrictive in this context. Common factors may possess a significant influence on the response variable which cannot be captured by the specific effects of a small number of individual variables. We therefore propose to include principal components as additional explanatory variables in an augmented regression model. We give finite sample inequalities for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
