Evaluation and selection of models for out-of-sample prediction when the   sample size is small relative to the complexity of the data-generating   process

Hannes Leeb

arXiv:0802.3364·stat.ME·October 24, 2008

Evaluation and selection of models for out-of-sample prediction when the sample size is small relative to the complexity of the data-generating process

Hannes Leeb

PDF

TL;DR

This paper investigates model selection for out-of-sample prediction in regression when the sample size is small relative to the complexity of the data-generating process, providing finite-sample analysis for challenging scenarios.

Contribution

It offers explicit finite-sample results for model selection in small-sample, high-complexity settings, extending beyond traditional large-sample asymptotic analyses.

Findings

01

Finite-sample bounds for model selection accuracy

02

Analysis applicable when number of models exceeds sample size

03

Insights into model performance in high-dimensional, small-sample contexts

Abstract

In regression with random design, we study the problem of selecting a model that performs well for out-of-sample prediction. We do not assume that any of the candidate models under consideration are correct. Our analysis is based on explicit finite-sample results. Our main findings differ from those of other analyses that are based on traditional large-sample limit approximations because we consider a situation where the sample size is small relative to the complexity of the data-generating process, in the sense that the number of parameters in a `good' model is of the same order as sample size. Also, we allow for the case where the number of candidate models is (much) larger than sample size.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.