Generalization performance of narrow one-hidden layer networks in the teacher-student setting
Rodrigo P\'erez Ortiz, Gibbs Nwemadji, Jean Barbier, Federica Gerace, Alessandro Ingrosso, Clarissa Lauditi, Enrico M. Malatesta

TL;DR
This paper provides a comprehensive theoretical analysis of the generalization performance of wide, one-hidden-layer neural networks in a teacher-student setting, revealing a phase transition to feature specialization.
Contribution
It develops a general framework using statistical physics to derive closed-form expressions for network performance, filling a gap in theoretical understanding.
Findings
Identifies a phase transition to feature specialization as sample size increases.
Accurately predicts generalization error for regression and classification tasks.
Provides a unified theory for Bayesian and empirical risk minimization in this setting.
Abstract
Understanding the generalization properties of neural networks on simple input-output distributions is key to explaining their performance on real datasets. The classical teacher-student setting, where a network is trained on data generated by a teacher model, provides a canonical theoretical test bed. In this context, a complete theoretical characterization of fully connected one-hidden-layer networks with generic activation functions remains missing. In this work, we develop a general framework for such networks with large width, yet much smaller than the input dimension. Using methods from statistical physics, we derive closed-form expressions for the typical performance of both finite-temperature (Bayesian) and empirical risk minimization estimators in terms of a small number of order parameters. We uncover a transition to a specialization phase, where hidden neurons align with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Neural Networks and Applications
