On overfitting and post-selection uncertainty assessments
Liang Hong, Todd A. Kuffner, Ryan Martin

TL;DR
This paper investigates how data-driven model selection in regression can lead to overfitting and invalid inference, emphasizing the importance of accounting for model selection uncertainty.
Contribution
It explains the overfitting phenomenon in model selection criteria and its impact on post-selection inference validity.
Findings
Classical linear model theory may not hold after model selection.
Overfitting occurs due to dependence on data-driven variable selection.
Ignoring selection uncertainty can lead to misleading inference.
Abstract
In a regression context, when the relevant subset of explanatory variables is uncertain, it is common to use a data-driven model selection procedure. Classical linear model theory, applied naively to the selected sub-model, may not be valid because it ignores the selected sub-model's dependence on the data. We provide an explanation of this phenomenon, in terms of overfitting, for a class of model selection criteria.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Fault Detection and Control Systems · Advanced Statistical Methods and Models
