Power-Law Spectrum of the Random Feature Model
Elliot Paquette, Ke Liang Xiao, Yizhe Zhu

TL;DR
This paper investigates how the spectral power-law decay of data covariance matrices is preserved or altered after passing through a random feature layer with nonlinear activation, revealing that the power-law structure is largely maintained with minor logarithmic modifications.
Contribution
It provides a rigorous characterization of the eigenvalue spectrum of the random feature covariance for data with power-law spectra, showing preservation of the spectral decay exponent under random nonlinear projections.
Findings
Eigenvalues follow a power-law decay with the same exponent as the input covariance.
Logarithmic corrections depend on the degree of the monomial activation.
The spectral structure is preserved up to polylogarithmic factors.
Abstract
Scaling laws for neural networks, in which the loss decays as a power-law in the number of parameters, data, and compute, depend fundamentally on the spectral structure of the data covariance, with power-law eigenvalue decay appearing ubiquitously in vision and language tasks. A central question is whether this spectral structure is preserved or destroyed when data passes through the basic building block of a neural network: a random linear projection followed by a nonlinear activation. We study this question for the random feature model: given data where has -power-law spectrum (, ), a Gaussian sketch matrix , and an entrywise monomial , we characterize the eigenvalues of the population random-feature covariance $\mathbb{E}_{x }[\frac{1}{d}f(W^\top x…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Quantum many-body systems · Neural Networks and Applications
