A Bayesian Model Selection Criterion for Selecting Pretraining Checkpoints

Michael Munn; Susan Wei

arXiv:2410.05612·cs.LG·May 30, 2025

A Bayesian Model Selection Criterion for Selecting Pretraining Checkpoints

Michael Munn, Susan Wei

PDF

Open Access 1 Video

TL;DR

This paper proposes a Bayesian model selection criterion called downstream free energy to identify pretraining checkpoints that are most adaptable for downstream tasks, without requiring access to downstream data.

Contribution

It introduces a novel Bayesian criterion for selecting pretraining checkpoints that correlates with downstream performance and does not need downstream data or prior task knowledge.

Findings

01

The criterion reliably predicts better finetuning outcomes.

02

It correlates well with downstream task adaptability.

03

The method is effective across different models and tasks.

Abstract

Recent advances in artificial intelligence have been fueled by the development of foundation models such as BERT, GPT, T5, and Vision Transformers. These models are first pretrained on vast and diverse datasets and then adapted to specific downstream tasks, often with significantly less data. However, the mechanisms behind the success of this ubiquitous pretrain-then-adapt paradigm remain underexplored, particularly the characteristics of pretraining checkpoints that enhance downstream adaptation. We introduce a Bayesian model selection criterion, called the downstream free energy, which quantifies a checkpoint's adaptability by measuring the concentration of nearby favorable parameters for the downstream task. We demonstrate that this Bayesian model selection criterion can be effectively implemented without access to the downstream data or prior knowledge of the downstream task.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

A Bayesian Model Selection Criterion for Selecting Pretraining Checkpoints· slideslive

Taxonomy

TopicsSpeech Recognition and Synthesis · Engineering Applied Research · Nuclear and radioactivity studies

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Gated Linear Unit · WordPiece · SentencePiece · Inverse Square Root Schedule · Linear Layer · Cosine Annealing · Adafactor · Linear Warmup With Linear Decay