An exactly solvable model for emergence and scaling laws in the   multitask sparse parity problem

Yoonsoo Nam; Nayara Fonseca; Seok Hyeong Lee; Chris Mingard; Ard A.; Louis

arXiv:2404.17563·cs.LG·April 30, 2025

An exactly solvable model for emergence and scaling laws in the multitask sparse parity problem

Yoonsoo Nam, Nayara Fonseca, Seok Hyeong Lee, Chris Mingard, Ard A., Louis

PDF

Open Access 1 Video

TL;DR

This paper introduces an analytically solvable model that explains how neural networks develop new skills and follow scaling laws during training, matching empirical observations in multitask sparse parity problems.

Contribution

It provides a theoretical framework with explicit formulas for skill emergence and scaling laws, validated against neural network simulations.

Findings

01

Model captures sigmoidal skill emergence with a single parameter

02

Analytic expressions for loss scaling laws derived

03

Good agreement with neural network experiments

Abstract

Deep learning models can exhibit what appears to be a sudden ability to solve a new problem as training time, training data, or model size increases, a phenomenon known as emergence. In this paper, we present a framework where each new ability (a skill) is represented as a basis function. We solve a simple multi-linear model in this skill-basis, finding analytic expressions for the emergence of new skills, as well as for scaling laws of the loss with training time, data size, model size, and optimal compute. We compare our detailed calculations to direct simulations of a two-layer neural network trained on multitask sparse parity, where the tasks in the dataset are distributed according to a power-law. Our simple model captures, using a single fit parameter, the sigmoidal emergence of multiple new skills as training time, data size or model size increases in the neural network.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

An exactly solvable model for emergence and scaling laws in the multitask sparse parity problem· slideslive

Taxonomy

TopicsOpinion Dynamics and Social Influence · Complex Network Analysis Techniques