Predicting Large Model Test Losses with a Noisy Quadratic System

Chuning Li; Chris J. Maddison

arXiv:2605.09154·cs.LG·May 12, 2026

Predicting Large Model Test Losses with a Noisy Quadratic System

Chuning Li, Chris J. Maddison

PDF

1 Repo

TL;DR

This paper presents a novel loss prediction model for large models that accurately estimates pre-training loss based on size, batch size, and updates, outperforming previous models and aiding optimal resource configuration.

Contribution

It introduces the first loss prediction model capable of handling changing batch sizes, improving extrapolation accuracy and resource optimization over existing heuristic laws.

Findings

01

Model outperforms Chinchilla's loss model in extrapolation tasks.

02

Configurations chosen by the model are close to the ground-truth optimal.

03

The implementation is publicly available on GitHub.

Abstract

We introduce a predictive model that estimates the pre-training loss of large models from model size (N), batch size (B) and number of weight updates (K). This is the first loss prediction model that can handle changing batch size. The model outperforms Chinchilla's loss model, a model of the test loss using the batch size and number of tokens, in terms of projecting the loss at extrapolated compute budgets (up to 1000 folds). A natural use of the model is to find optimal N, B, K configurations under explicit and compound resource constraints like time, memory and compute. In our experiments, the model-selected configurations are close to ground-truth optimal. Our work advocates for loss prediction as a better alternative to heuristic-based laws, which are growing in complexity. The implementation is available on https://github.com/chuningxdy/Noisy-Quadratic-System.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

chuningxdy/Noisy-Quadratic-System
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.