(Mis)Fitting: A Survey of Scaling Laws

Margaret Li; Sneha Kudugunta; Luke Zettlemoyer

arXiv:2502.18969·cs.LG·February 27, 2025

(Mis)Fitting: A Survey of Scaling Laws

Margaret Li, Sneha Kudugunta, Luke Zettlemoyer

PDF

Open Access 1 Repo 1 Models 1 Datasets

TL;DR

This survey critically examines how various factors influence the derivation of scaling laws in foundation models, highlighting discrepancies in prior research and proposing guidelines for reproducibility.

Contribution

The paper provides a comprehensive review of over 50 studies on scaling laws, analyzes the impact of methodological differences, and introduces a checklist to improve reproducibility in scaling law research.

Findings

01

Most studies use power laws to describe scaling trends.

02

Methodological differences significantly affect scaling law conclusions.

03

Many papers lack crucial details for reproducibility.

Abstract

Modern foundation models rely heavily on using scaling laws to guide crucial training decisions. Researchers often extrapolate the optimal architecture and hyper parameters settings from smaller training runs by describing the relationship between, loss, or task performance, and scale. All components of this process vary, from the specific equation being fit, to the training setup, to the optimization method. Each of these factors may affect the fitted law, and therefore, the conclusions of a given study. We discuss discrepancies in the conclusions that several prior works reach, on questions such as the optimal token to parameter ratio. We augment this discussion with our own analysis of the critical impact that changes in specific details may effect in a scaling study, and the resulting altered conclusions. Additionally, we survey over 50 papers that study scaling trends: while 45 of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hadasah/scaling_laws
pytorchOfficial

Models

🤗
misfitting/misfitting
model· ♡ 1
♡ 1

Datasets

open-athena/isoflop-experiments
dataset· 98 dl
98 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning · Machine Learning and Data Classification