Predictable Scale: Part II, Farseer: A Refined Scaling Law in Large Language Models
Houyi Li, Wenzhen Zheng, Qiufeng Wang, Zhenyu Ding, Haoying Wang, Zili Wang, Shijie Xuyang, Ning Ding, Shuigeng Zhou, Xiangyu Zhang, Daxin Jiang

TL;DR
Farseer is a refined scaling law for large language models that significantly improves predictive accuracy over previous laws, enabling reliable extrapolation from small-scale experiments to large-scale training.
Contribution
We introduce Farseer, a new scaling law that offers more accurate and generalizable predictions of LLM performance across scales, surpassing prior models like Chinchilla's law.
Findings
Farseer reduces extrapolation error by 433%.
It accurately predicts performance across diverse model scales.
Validated with ~1,000 models trained on 3 million GPU hours.
Abstract
Training Large Language Models (LLMs) is prohibitively expensive, creating a critical scaling gap where insights from small-scale experiments often fail to transfer to resource-intensive production systems, thereby hindering efficient innovation. To bridge this, we introduce Farseer, a novel and refined scaling law offering enhanced predictive accuracy across scales. By systematically constructing a model loss surface , Farseer achieves a significantly better fit to empirical data than prior laws (e.g., Chinchilla's law). Our methodology yields accurate, robust, and highly generalizable predictions, demonstrating excellent extrapolation capabilities, improving upon Chinchilla's law by reducing extrapolation error by 433\%. This allows for the reliable evaluation of competing training strategies across all settings, enabling conclusions from small-scale ablation studies…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
