On overfitting and post-selection uncertainty assessments

Liang Hong; Todd A. Kuffner; Ryan Martin

arXiv:1712.02379·math.ST·December 8, 2017

On overfitting and post-selection uncertainty assessments

Liang Hong, Todd A. Kuffner, Ryan Martin

PDF

Open Access

TL;DR

This paper investigates how data-driven model selection in regression can lead to overfitting and invalid inference, emphasizing the importance of accounting for model selection uncertainty.

Contribution

It explains the overfitting phenomenon in model selection criteria and its impact on post-selection inference validity.

Findings

01

Classical linear model theory may not hold after model selection.

02

Overfitting occurs due to dependence on data-driven variable selection.

03

Ignoring selection uncertainty can lead to misleading inference.

Abstract

In a regression context, when the relevant subset of explanatory variables is uncertain, it is common to use a data-driven model selection procedure. Classical linear model theory, applied naively to the selected sub-model, may not be valid because it ignores the selected sub-model's dependence on the data. We provide an explanation of this phenomenon, in terms of overfitting, for a class of model selection criteria.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Inference · Fault Detection and Control Systems · Advanced Statistical Methods and Models