From Kernels to Features: A Multi-Scale Adaptive Theory of Feature Learning
Noa Rubin, Kirsten Fischer, Javed Lindner, David Dahmen, Inbar Seroussi, Zohar Ringel, Michael Kr\"amer, Moritz Helias

TL;DR
This paper develops a unified theoretical framework for understanding feature learning in neural networks, bridging kernel scale changes and kernel adaptation, revealing how networks learn features across different regimes.
Contribution
It introduces a multi-scale adaptive theory that connects kernel change and adaptation, providing analytical tools to analyze feature learning in neural networks.
Findings
Kernel adaptation can be approximated by kernel rescaling in linear networks.
Multi-scale analysis captures directional feature learning effects.
The framework applies across different scaling regimes and network types.
Abstract
Feature learning in neural networks is crucial for their expressive power and inductive biases, motivating various theoretical approaches. Some approaches describe network behavior after training through a change in kernel scale from initialization, resulting in a generalization power comparable to a Gaussian process. Conversely, in other approaches training results in the adaptation of the kernel to the data, involving directional changes to the kernel. The relationship and respective strengths of these two views have so far remained unresolved. This work presents a theoretical framework of multi-scale adaptive feature learning bridging these two views. Using methods from statistical mechanics, we derive analytical expressions for network output statistics which are valid across scaling regimes and in the continuum between them. A systematic expansion of the network's probability…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNeural Networks and Applications
