Learning curves theory for hierarchically compositional data with power-law distributed features
Francesco Cagnetta, Hyunmo Kang, Matthieu Wyart

TL;DR
This paper develops a theoretical framework linking hierarchical compositional data structures, modeled by probabilistic context-free grammars, to neural scaling laws, revealing how power-law distributions influence learning curves in classification and prediction tasks.
Contribution
It unifies theories of neural scaling laws and hierarchical data structures by analyzing probabilistic grammars, showing how power-law distributions affect learning dynamics.
Findings
Power-law distributed production rules lead to power-law learning curves in classification.
Hierarchical structure influences the multiplicative constant in learning curves.
Distribution of production rules affects local details but not large-scale behavior in prediction.
Abstract
Recent theories suggest that Neural Scaling Laws arise whenever the task is linearly decomposed into power-law distributed units. Alternatively, scaling laws also emerge when data exhibit a hierarchically compositional structure, as is thought to occur in language and images. To unify these views, we consider classification and next-token prediction tasks based on probabilistic context-free grammars -- probabilistic models that generate data via a hierarchy of production rules. For classification, we show that having power-law distributed production rules results in a power-law learning curve with an exponent depending on the rules' distribution and a large multiplicative constant that depends on the hierarchical structure. By contrast, for next-token prediction, the distribution of production rules controls the local details of the learning curve, but not the exponent describing the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLanguage and cultural evolution · Neural Networks and Applications · Ferroelectric and Negative Capacitance Devices
