Scaling Laws Under the Microscope: Predicting Transformer Performance   from Small Scale Experiments

Maor Ivgi; Yair Carmon; Jonathan Berant

arXiv:2202.06387·cs.CL·October 19, 2022

Scaling Laws Under the Microscope: Predicting Transformer Performance from Small Scale Experiments

Maor Ivgi, Yair Carmon, Jonathan Berant

PDF

Open Access

TL;DR

This paper empirically investigates neural scaling laws in language models, showing they can predict performance and aid debugging, but require careful tuning and multiple runs, affecting computational efficiency.

Contribution

It demonstrates the emergence of scaling laws at finetuning and their utility for model prediction and debugging across NLP tasks, highlighting practical considerations.

Findings

01

Scaling laws emerge at finetuning in some NLP tasks.

02

Scaling laws can predict larger model performance.

03

Careful hyperparameter tuning is necessary for revealing scaling laws.

Abstract

Neural scaling laws define a predictable relationship between a model's parameter count and its performance after training in the form of a power law. However, most research to date has not explicitly investigated whether scaling laws can be used to accelerate model development. In this work, we perform such an empirical investigation across a wide range of language understanding tasks, starting from models with as few as 10K parameters, and evaluate downstream performance across 9 language understanding tasks. We find that scaling laws emerge at finetuning time in some NLP tasks, and that they can also be exploited for debugging convergence when training large models. Moreover, for tasks where scaling laws exist, they can be used to predict the performance of larger models, which enables effective model selection. However, revealing scaling laws requires careful hyperparameter tuning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning in Materials Science