Finite sample improvement of Akaike's Information Criterion

Adrien Saumard; Fabien Navarro

arXiv:1803.02078·math.ST·July 23, 2018

Finite sample improvement of Akaike's Information Criterion

Adrien Saumard, Fabien Navarro

PDF

TL;DR

This paper introduces an improved version of Akaike's Information Criterion that reduces overfitting in small samples by incorporating an over-penalization approach, supported by theoretical guarantees and empirical results.

Contribution

It proposes a novel over-penalization modification to AIC that enhances model selection accuracy, especially in small sample scenarios, with proven nonasymptotic optimality.

Findings

01

The modified criterion avoids overfitting in small samples.

02

It achieves sharp oracle inequalities in density estimation.

03

Experimental results outperform AICc in bin size selection.

Abstract

We emphasize that it is possible to improve the principle of unbiased risk estimation for model selection by addressing excess risk deviations in the design of penalization procedures. Indeed, we propose a modification of Akaike's Information Criterion that avoids overfitting, even when the sample size is small. We call this correction an over-penalization procedure. As proof of concept, we show the nonasymptotic optimality of our histogram selection procedure in density estimation by establishing sharp oracle inequalities for the Kullback-Leibler divergence. One of the main features of our theoretical results is that they include the estimation of unbounded logdensities. To do so, we prove several analytical and probabilistic lemmas that are of independent interest. In an experimental study, we also demonstrate state-of-the-art performance of our over-penalization criterion for bin…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.