Statistical physics of deep learning: Optimal learning of a multi-layer perceptron near interpolation
Jean Barbier, Francesco Camilli, Minh-Toan Nguyen, Mauro Pastore, Rudy Skerk

TL;DR
This paper uses statistical physics to analyze deep learning, specifically multi-layer perceptrons near the interpolation limit, revealing how they learn features and the effects of depth, width, and data on their performance.
Contribution
It extends statistical physics analysis to deep neural networks, identifying fundamental learning limits and the dynamics of feature learning in multi-layer perceptrons.
Findings
Optimal performance achieved through layer-wise specialization.
Deeper targets are more difficult to learn.
Finite width and non-linearity influence feature learning.
Abstract
For four decades statistical physics has been providing a framework to analyse neural networks. A long-standing question remained on its capacity to tackle deep learning models capturing rich feature learning effects, thus going beyond the narrow networks or kernel methods analysed until now. We positively answer through the study of the supervised learning of a multi-layer perceptron. Importantly, (i) its width scales as the input dimension, making it more prone to feature learning than ultra wide networks, and more expressive than narrow ones or ones with fixed embedding layers; and (ii) we focus on the challenging interpolation regime where the number of trainable parameters and data are comparable, which forces the model to adapt to the task. We consider the matched teacher-student setting. Therefore, we provide the fundamental limits of learning random deep neural network targets…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
