Problems with Chinchilla Approach 2: Systematic Biases in IsoFLOP Parabola Fits

Eric Czech; Zhiwei Xu; Yael Elmatad; Yixin Wang; William Held

arXiv:2603.22339·cs.LG·March 31, 2026

Problems with Chinchilla Approach 2: Systematic Biases in IsoFLOP Parabola Fits

Eric Czech, Zhiwei Xu, Yael Elmatad, Yixin Wang, William Held

PDF

1 Repo 1 Datasets

TL;DR

This paper identifies systematic biases in the widely used Chinchilla Approach 2 for fitting neural scaling laws, proposes improvements, and demonstrates more accurate, unbiased inference methods with practical advantages.

Contribution

It introduces Variable Projection to address biases in Approach 2, offering a more stable, unbiased, and scalable method for neural scaling law fitting.

Findings

01

Chinchilla Approach 2 introduces biases leading to underallocation and unnecessary compute.

02

Approach 3 reduces biases but has perceived drawbacks that are addressable.

03

Variable Projection enables unbiased, well-conditioned inference on all loss surface parameters.

Abstract

Chinchilla Approach 2 is among the most widely used methods for fitting neural scaling laws. Its parabolic approximation introduces systematic biases in compute-optimal allocation estimates, even on noise-free synthetic data. Applied to published Llama 3 IsoFLOP data at open frontier compute scales, these biases imply a parameter underallocation corresponding to 6.5% of the $3.8 \times 1 0^{25}$ FLOP training budget and $1.4M (90% CI: $412K-$2.9M) in unnecessary compute at 50% H100 MFU. Simulated multimodal model misallocations show even greater opportunity costs due to higher loss surface asymmetry. Three sources of this error are examined: IsoFLOP sampling grid width (Taylor approximation accuracy), uncentered IsoFLOP sampling, and loss surface asymmetry ( $α \neq = β$ ). Chinchilla Approach 3 largely eliminates these biases but is often regarded as less data-efficient,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Open-Athena/vpnls
github

Datasets

open-athena/isoflop-experiments
dataset· 58 dl
58 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.