PTPP-Aware Adaptation Scaling Laws: Predicting Domain-Adaptation Performance at Unseen Pre-Training Budgets
Etienne Goffinet, Shane Bergsma, Avraham Sheinin, Natalia Vassilieva, Shaheer Muhammad, Preslav Nakov, Gurpreet Gosal

TL;DR
This paper introduces PTPP-aware adaptation scaling laws that explicitly incorporate pre-training budgets, enabling accurate prediction of domain adaptation performance at unseen pre-training scales and aiding in planning adaptation strategies.
Contribution
The paper develops PTPP-aware scaling laws that improve prediction accuracy for domain adaptation performance across different pre-training budgets, addressing limitations of previous fixed-budget models.
Findings
PTPP-aware laws outperform PTPP-agnostic baselines in multilingual adaptation tasks.
Accurate predictions of target loss at high pre-training scales are achieved.
Practical use cases include planning adaptation budgets under compute constraints.
Abstract
Continual pre-training (CPT) for domain adaptation must balance target-domain gains with stability on the base domain. Existing CPT scaling laws typically assume a fixed pre-training budget, which limits their ability to forecast adaptation outcomes for models trained at different tokens-per-parameter (PTPP). We present \emph{PTPP-aware} adaptation scaling laws that make the pre-training budget an explicit variable, enabling accurate \emph{prediction} of adaptation loss at unseen \ptpp. On a multilingual setup (English/Arabic French), PTPP-aware formulations trained on early stages (\ptpp{}=\{15,31\}) predict target loss at \ptpp{}=279 and outperform a PTPP-agnostic \dcpt{} transfer baseline on metrics (Huber-on-log, MAE, calibration slope); full diagnostics (RMSE, MAPE) are in the appendix. Beyond forecasting, we show a practical use case: planning replay…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
