Evaluation and selection of models for out-of-sample prediction when the sample size is small relative to the complexity of the data-generating process
Hannes Leeb

TL;DR
This paper investigates model selection for out-of-sample prediction in regression when the sample size is small relative to the complexity of the data-generating process, providing finite-sample analysis for challenging scenarios.
Contribution
It offers explicit finite-sample results for model selection in small-sample, high-complexity settings, extending beyond traditional large-sample asymptotic analyses.
Findings
Finite-sample bounds for model selection accuracy
Analysis applicable when number of models exceeds sample size
Insights into model performance in high-dimensional, small-sample contexts
Abstract
In regression with random design, we study the problem of selecting a model that performs well for out-of-sample prediction. We do not assume that any of the candidate models under consideration are correct. Our analysis is based on explicit finite-sample results. Our main findings differ from those of other analyses that are based on traditional large-sample limit approximations because we consider a situation where the sample size is small relative to the complexity of the data-generating process, in the sense that the number of parameters in a `good' model is of the same order as sample size. Also, we allow for the case where the number of candidate models is (much) larger than sample size.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
