Critical feature learning in deep neural networks
Kirsten Fischer, Javed Lindner, David Dahmen, Zohar Ringel, Michael, Kr\"amer, Moritz Helias

TL;DR
This paper develops a theoretical framework for understanding feature learning in deep neural networks by analyzing finite-width effects, kernel evolution, and the role of fluctuations in the Bayesian prior.
Contribution
It introduces a systematic theory of network kernels in finite-width deep networks, linking feature learning to criticality and prior fluctuations.
Findings
Kernel distribution depends inversely on network width N.
Backward propagation aligns kernels with target features.
Finite-width fluctuations enable kernel adaptation to data.
Abstract
A key property of neural networks driving their success is their ability to learn features from data. Understanding feature learning from a theoretical viewpoint is an emerging field with many open questions. In this work we capture finite-width effects with a systematic theory of network kernels in deep non-linear neural networks. We show that the Bayesian prior of the network can be written in closed form as a superposition of Gaussian processes, whose kernels are distributed with a variance that depends inversely on the network width N . A large deviation approach, which is exact in the proportional limit for the number of data points , yields a pair of forward-backward equations for the maximum a posteriori kernels in all layers at once. We study their solutions perturbatively to demonstrate how the backward propagation across layers aligns kernels…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
