Gibbs-Based Information Criteria and the Over-Parameterized Regime

Haobo Chen; Yuheng Bu; Gregory W. Wornell

arXiv:2306.05583·cs.LG·November 15, 2023·1 cites

Gibbs-Based Information Criteria and the Over-Parameterized Regime

Haobo Chen, Yuheng Bu, Gregory W. Wornell

PDF

Open Access

TL;DR

This paper extends information criteria analysis to over-parameterized models using Gibbs algorithms, providing new BIC formulations that explain the double-descent phenomenon in high-dimensional settings.

Contribution

It introduces Gibbs-based AIC and BIC with information-theoretic penalties, extending classical criteria to over-parameterized models and analyzing their role in double-descent.

Findings

01

Gibbs-based BIC can effectively select high-dimensional models.

02

The analysis reveals a mismatch between marginal likelihood and population risk.

03

Experiments confirm the relevance of the new BIC in understanding double-descent.

Abstract

Double-descent refers to the unexpected drop in test loss of a learning algorithm beyond an interpolating threshold with over-parameterization, which is not predicted by information criteria in their classical forms due to the limitations in the standard asymptotic approach. We update these analyses using the information risk minimization framework and provide Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) for models learned by the Gibbs algorithm. Notably, the penalty terms for the Gibbs-based AIC and BIC correspond to specific information measures, i.e., symmetrized KL information and KL divergence. We extend this information-theoretic analysis to over-parameterized models by providing two different Gibbs-based BICs to compute the marginal likelihood of random feature models in the regime where the number of parameters $p$ and the number of samples $n$ …

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Bayesian Methods and Mixture Models · Machine Learning and Algorithms