Is the Number of Trainable Parameters All That Actually Matters?

Am\'elie Chatelain; Amine Djeghri; Daniel Hesslow; Julien; Launay; Iacopo Poli

arXiv:2109.11928·stat.ML·September 27, 2021·1 cites

Is the Number of Trainable Parameters All That Actually Matters?

Am\'elie Chatelain, Amine Djeghri, Daniel Hesslow, Julien, Launay, Iacopo Poli

PDF

Open Access

TL;DR

This paper investigates whether the number of trainable parameters alone determines model performance, finding that scaling laws depend solely on trainable parameters and cannot be fooled by spurious or frozen parameters.

Contribution

It demonstrates that effective model scaling is fundamentally tied to trainable parameters, even when using approximations like frozen or structured parameters to emulate larger models.

Findings

01

Scaling laws depend only on trainable parameters.

02

Frozen or structured parameters do not affect the scaling relationship.

03

Scaling laws cannot be deceived by spurious parameters.

Abstract

Recent work has identified simple empirical scaling laws for language models, linking compute budget, dataset size, model size, and autoregressive modeling loss. The validity of these simple power laws across orders of magnitude in model scale provides compelling evidence that larger models are also more capable models. However, scaling up models under the constraints of hardware and infrastructure is no easy feat, and rapidly becomes a hard and expensive engineering problem. We investigate ways to tentatively cheat scaling laws, and train larger models for cheaper. We emulate an increase in effective parameters, using efficient approximations: either by doping the models with frozen random parameters, or by using fast structured transforms in place of dense linear layers. We find that the scaling relationship between test loss and compute depends only on the actual number of trainable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning and Data Classification

MethodsTest