Predictable Scale: Part II, Farseer: A Refined Scaling Law in Large Language Models

Houyi Li; Wenzhen Zheng; Qiufeng Wang; Zhenyu Ding; Haoying Wang; Zili Wang; Shijie Xuyang; Ning Ding; Shuigeng Zhou; Xiangyu Zhang; Daxin Jiang

arXiv:2506.10972·cs.LG·July 17, 2025

Predictable Scale: Part II, Farseer: A Refined Scaling Law in Large Language Models

Houyi Li, Wenzhen Zheng, Qiufeng Wang, Zhenyu Ding, Haoying Wang, Zili Wang, Shijie Xuyang, Ning Ding, Shuigeng Zhou, Xiangyu Zhang, Daxin Jiang

PDF

Open Access 1 Repo

TL;DR

Farseer is a refined scaling law for large language models that significantly improves predictive accuracy over previous laws, enabling reliable extrapolation from small-scale experiments to large-scale training.

Contribution

We introduce Farseer, a new scaling law that offers more accurate and generalizable predictions of LLM performance across scales, surpassing prior models like Chinchilla's law.

Findings

01

Farseer reduces extrapolation error by 433%.

02

It accurately predicts performance across diverse model scales.

03

Validated with ~1,000 models trained on 3 million GPU hours.

Abstract

Training Large Language Models (LLMs) is prohibitively expensive, creating a critical scaling gap where insights from small-scale experiments often fail to transfer to resource-intensive production systems, thereby hindering efficient innovation. To bridge this, we introduce Farseer, a novel and refined scaling law offering enhanced predictive accuracy across scales. By systematically constructing a model loss surface $L (N, D)$ , Farseer achieves a significantly better fit to empirical data than prior laws (e.g., Chinchilla's law). Our methodology yields accurate, robust, and highly generalizable predictions, demonstrating excellent extrapolation capabilities, improving upon Chinchilla's law by reducing extrapolation error by 433\%. This allows for the reliable evaluation of competing training strategies across all $(N, D)$ settings, enabling conclusions from small-scale ablation studies…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

farseer-scaling-law/farseer
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques