The Power of Power Law: Asymmetry Enables Compositional Reasoning

Zixuan Wang; Xingyu Dang; Jason D. Lee; Kaifeng Lyu

arXiv:2604.22951·cs.AI·April 28, 2026

The Power of Power Law: Asymmetry Enables Compositional Reasoning

Zixuan Wang, Xingyu Dang, Jason D. Lee, Kaifeng Lyu

PDF

TL;DR

This paper demonstrates that training language models on power-law distributed data, which reflects real-world data, enhances compositional reasoning and skill acquisition more effectively than uniform data distribution, supported by theoretical and empirical evidence.

Contribution

It reveals that power-law data distributions facilitate learning of compositional skills with less data and provides a theoretical explanation for this advantage.

Findings

01

Power-law training outperforms uniform training on compositional tasks.

02

Learning under power-law distribution requires less data for skill acquisition.

03

Power-law sampling improves the loss landscape, aiding skill learning.

Abstract

Natural language data follows a power-law distribution, with most knowledge and skills appearing at very low frequency. While a common intuition suggests that reweighting or curating data towards a uniform distribution may help models better learn these long-tail skills, we find a counterintuitive result: across a wide range of compositional reasoning tasks, such as state tracking and multi-step arithmetic, training under power-law distributions consistently outperforms training under uniform distributions. To understand this advantage, we introduce a minimalist skill-composition task and show that learning under a power-law distribution provably requires significantly less training data. Our theoretical analysis reveals that power law sampling induces a beneficial asymmetry that improves the pathological loss landscape, which enables models to first acquire high-frequency skill…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.