Trust, but verify: benefits and pitfalls of least-squares refitting in high dimensions
Johannes Lederer

TL;DR
This paper investigates the advantages and disadvantages of least-squares refitting in high-dimensional regression, highlighting how correlations in design matrices influence its effectiveness for prediction and estimation, and proposing a criterion to guide its use.
Contribution
The paper provides new theoretical and numerical insights into when least-squares refitting is beneficial in high-dimensional settings, especially considering design matrix correlations, and introduces a practical criterion for its application.
Findings
Refitting can improve prediction in some high-dimensional settings.
Correlations in design matrices significantly affect refitting benefits.
A new criterion helps decide when to apply least-squares refitting.
Abstract
Least-squares refitting is widely used in high dimensional regression to reduce the prediction bias of l1-penalized estimators (e.g., Lasso and Square-Root Lasso). We present theoretical and numerical results that provide new insights into the benefits and pitfalls of least-squares refitting. In particular, we consider both prediction and estimation, and we pay close attention to the effects of correlations in the design matrices of linear regression models, since these correlations - although often neglected - are crucial in the context of linear regression, especially in high dimensional contexts. First, we demonstrate that the benefit of least-squares refitting strongly depends on the setting and task under consideration: least-squares refitting can be beneficial even for settings with highly correlated design matrices but is not advisable for all settings, and least-squares…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Statistical Methods and Bayesian Inference · Advanced Causal Inference Techniques
