Analyzing Neural Scaling Laws in Two-Layer Networks with Power-Law Data Spectra
Roman Worschech, Bernd Rosenow

TL;DR
This paper provides a theoretical analysis of neural scaling laws in two-layer networks with power-law spectral data, revealing how data structure influences learning dynamics and generalization error.
Contribution
It introduces a statistical mechanics framework to analyze generalization in two-layer networks with power-law spectra, extending understanding beyond empirical observations.
Findings
Derives analytical expressions for generalization error with linear activations.
Identifies conditions for power-law scaling in learning curves.
Shows transition from exponential to power-law convergence in certain regimes.
Abstract
Neural scaling laws describe how the performance of deep neural networks scales with key factors such as training data size, model complexity, and training time, often following power-law behaviors over multiple orders of magnitude. Despite their empirical observation, the theoretical understanding of these scaling laws remains limited. In this work, we employ techniques from statistical mechanics to analyze one-pass stochastic gradient descent within a student-teacher framework, where both the student and teacher are two-layer neural networks. Our study primarily focuses on the generalization error and its behavior in response to data covariance matrices that exhibit power-law spectra. For linear activation functions, we derive analytical expressions for the generalization error, exploring different learning regimes and identifying conditions under which power-law scaling emerges.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNeural Networks and Applications
