Why you don't overfit, and don't need Bayes if you only train for one   epoch

Laurence Aitchison

arXiv:2411.14478·cs.LG·November 25, 2024

Why you don't overfit, and don't need Bayes if you only train for one epoch

Laurence Aitchison

PDF

Open Access

TL;DR

In data-rich, single-epoch training scenarios, standard maximum likelihood training effectively optimizes the true data distribution, making Bayesian methods unnecessary for overfitting prevention or calibration.

Contribution

The paper demonstrates that in one-epoch training, maximum likelihood and Bayesian model averaging optimize the same objective, reducing the need for Bayesian approaches in such settings.

Findings

01

Maximum likelihood training in one epoch aligns with the true data-generating process.

02

Bayesian model averaging and maximum likelihood optimize the same objective in this setting.

03

Bayesian methods offer no additional benefit for overfitting or calibration in single-epoch training.

Abstract

Here, we show that in the data-rich setting where you only train on each datapoint once (or equivalently, you only train for one epoch), standard "maximum likelihood" training optimizes the true data generating process (DGP) loss, which is equivalent to the test loss. Further, we show that the Bayesian model average optimizes the same objective, albeit while taking the expectation over uncertainty induced by finite data. As standard maximum likelihood training in the single-epoch setting optimizes the same objective as Bayesian inference, we argue that we do not expect Bayesian inference to offer any advantages in terms of overfitting or calibration in these settings. This explains the diminishing importance of Bayes in areas such as LLMs, which are often trained with one (or very few) epochs.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Quality and Management · Semantic Web and Ontologies