Is the Number of Trainable Parameters All That Actually Matters?
Am\'elie Chatelain, Amine Djeghri, Daniel Hesslow, Julien, Launay, Iacopo Poli

TL;DR
This paper investigates whether the number of trainable parameters alone determines model performance, finding that scaling laws depend solely on trainable parameters and cannot be fooled by spurious or frozen parameters.
Contribution
It demonstrates that effective model scaling is fundamentally tied to trainable parameters, even when using approximations like frozen or structured parameters to emulate larger models.
Findings
Scaling laws depend only on trainable parameters.
Frozen or structured parameters do not affect the scaling relationship.
Scaling laws cannot be deceived by spurious parameters.
Abstract
Recent work has identified simple empirical scaling laws for language models, linking compute budget, dataset size, model size, and autoregressive modeling loss. The validity of these simple power laws across orders of magnitude in model scale provides compelling evidence that larger models are also more capable models. However, scaling up models under the constraints of hardware and infrastructure is no easy feat, and rapidly becomes a hard and expensive engineering problem. We investigate ways to tentatively cheat scaling laws, and train larger models for cheaper. We emulate an increase in effective parameters, using efficient approximations: either by doping the models with frozen random parameters, or by using fast structured transforms in place of dense linear layers. We find that the scaling relationship between test loss and compute depends only on the actual number of trainable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning and Data Classification
MethodsTest
