TL;DR
This paper introduces a generalized information criterion for model selection that asymptotically achieves optimal prediction performance, applicable to complex, high-dimensional, and streaming data scenarios, with proven theoretical guarantees.
Contribution
It proposes a new generalized Takeuchi's information criterion with proven asymptotic optimality and an online algorithm for streaming data, advancing model selection theory and practice.
Findings
Achieves asymptotic optimal out-sample prediction loss.
Reduces computational cost compared to cross-validation.
Effective for high-dimensional and streaming data models.
Abstract
A central issue of many statistical learning problems is to select an appropriate model from a set of candidate models. Large models tend to inflate the variance (or overfitting), while small models tend to cause biases (or underfitting) for a given fixed dataset. In this work, we address the critical challenge of model selection to strike a balance between model fitting and model complexity, thus gaining reliable predictive power. We consider the task of approaching the theoretical limit of statistical learning, meaning that the selected model has the predictive performance that is as good as the best possible model given a class of potentially misspecified candidate models. We propose a generalized notion of Takeuchi's information criterion and prove that the proposed method can asymptotically achieve the optimal out-sample prediction loss under reasonable assumptions. It is the first…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
