PTPP-Aware Adaptation Scaling Laws: Predicting Domain-Adaptation Performance at Unseen Pre-Training Budgets

Etienne Goffinet; Shane Bergsma; Avraham Sheinin; Natalia Vassilieva; Shaheer Muhammad; Preslav Nakov; Gurpreet Gosal

arXiv:2510.23198·cs.LG·October 28, 2025

PTPP-Aware Adaptation Scaling Laws: Predicting Domain-Adaptation Performance at Unseen Pre-Training Budgets

Etienne Goffinet, Shane Bergsma, Avraham Sheinin, Natalia Vassilieva, Shaheer Muhammad, Preslav Nakov, Gurpreet Gosal

PDF

TL;DR

This paper introduces PTPP-aware adaptation scaling laws that explicitly incorporate pre-training budgets, enabling accurate prediction of domain adaptation performance at unseen pre-training scales and aiding in planning adaptation strategies.

Contribution

The paper develops PTPP-aware scaling laws that improve prediction accuracy for domain adaptation performance across different pre-training budgets, addressing limitations of previous fixed-budget models.

Findings

01

PTPP-aware laws outperform PTPP-agnostic baselines in multilingual adaptation tasks.

02

Accurate predictions of target loss at high pre-training scales are achieved.

03

Practical use cases include planning adaptation budgets under compute constraints.

Abstract

Continual pre-training (CPT) for domain adaptation must balance target-domain gains with stability on the base domain. Existing CPT scaling laws typically assume a fixed pre-training budget, which limits their ability to forecast adaptation outcomes for models trained at different tokens-per-parameter (PTPP). We present \emph{PTPP-aware} adaptation scaling laws that make the pre-training budget an explicit variable, enabling accurate \emph{prediction} of adaptation loss at unseen \ptpp. On a multilingual setup (English/Arabic $\to$ French), PTPP-aware formulations trained on early stages (\ptpp{}=\{15,31\}) predict target loss at \ptpp{}=279 and outperform a PTPP-agnostic \dcpt{} transfer baseline on metrics (Huber-on-log, MAE $_{rel}$ , calibration slope); full diagnostics (RMSE, MAPE) are in the appendix. Beyond forecasting, we show a practical use case: planning replay…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.