Configuration-to-Performance Scaling Law with Neural Ansatz
Huaqing Zhang, Kaiyue Wen, Tengyu Ma

TL;DR
This paper introduces a neural approach to predict training performance from full configurations, enabling better hyperparameter tuning and extending to loss-curve prediction, with improved accuracy over existing laws.
Contribution
The authors propose a neural configuration-to-performance scaling law (NCPL) that predicts training outcomes from complete configurations, improving over traditional laws and supporting richer predictions.
Findings
NCPL achieves 20-40% lower prediction error than Chinchilla law.
It generalizes to runs with up to 10x more compute than training data.
Supports joint hyperparameter tuning and loss-curve prediction.
Abstract
Researchers build scaling laws to forecast the training performance of expensive large-scale runs with larger model size N and data size D. These laws assume that other training hyperparameters are optimally chosen, which can require significant effort and, in some cases, be impossible due to external hardware constraints. To improve predictability across a broader set of hyperparameters and enable simpler tuning at scale, we propose learning a \textit{Configuration-to-Performance Scaling Law} (CPL): a mapping from the \textit{full training configuration} to training performance. Because no simple functional form can express this mapping, we parameterize it with a large language model (LLM), and fit it with diverse open-source pretraining logs across multiple sources, yielding a \textit{Neural} Configuration-to-Performance Scaling Law (NCPL). NCPL accurately predicts how training…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Advanced Neural Network Applications · Machine Learning and Data Classification
