Neural Neural Scaling Laws
Michael Y. Hu, Jane Pan, Ayush Rajesh Jhaveri, Nicholas Lourie, Kyunghyun Cho

TL;DR
This paper introduces NeuNeu, a neural network model that predicts downstream task performance of language models by extrapolating accuracy trajectories, outperforming traditional parametric scaling laws.
Contribution
NeuNeu frames scaling law prediction as time-series extrapolation, combining token-level validation losses with accuracy trajectories, and generalizes across models and tasks.
Findings
NeuNeu achieves 1.99% MAE in predicting accuracy, a 44% improvement over logistic laws.
NeuNeu generalizes zero-shot to unseen models, architectures, and tasks.
Training on open-source checkpoints, NeuNeu outperforms traditional parametric models.
Abstract
Neural scaling laws predict how language model performance improves with increased training inputs. While aggregate metrics like validation loss can follow smooth power-law curves, individual downstream tasks exhibit diverse scaling behaviors: some improve monotonically, others plateau, and some even degrade with scale. We argue that predicting downstream performance from validation loss suffers from two limitations: averaging token-level losses obscures signal, and no simple parametric family can capture the full spectrum of scaling behaviors. To address this, we propose Neural Neural Scaling Laws (NeuNeu), a neural network that frames scaling law prediction as time-series extrapolation. NeuNeu combines temporal context from observed accuracy trajectories with token-level validation losses, learning to predict future performance without the limitations inherent in assuming a specific…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
