Generalization performance of narrow one-hidden layer networks in the teacher-student setting

Rodrigo P\'erez Ortiz; Gibbs Nwemadji; Jean Barbier; Federica Gerace; Alessandro Ingrosso; Clarissa Lauditi; Enrico M. Malatesta

arXiv:2507.00629·cond-mat.dis-nn·March 26, 2026

Generalization performance of narrow one-hidden layer networks in the teacher-student setting

Rodrigo P\'erez Ortiz, Gibbs Nwemadji, Jean Barbier, Federica Gerace, Alessandro Ingrosso, Clarissa Lauditi, Enrico M. Malatesta

PDF

Open Access

TL;DR

This paper provides a comprehensive theoretical analysis of the generalization performance of wide, one-hidden-layer neural networks in a teacher-student setting, revealing a phase transition to feature specialization.

Contribution

It develops a general framework using statistical physics to derive closed-form expressions for network performance, filling a gap in theoretical understanding.

Findings

01

Identifies a phase transition to feature specialization as sample size increases.

02

Accurately predicts generalization error for regression and classification tasks.

03

Provides a unified theory for Bayesian and empirical risk minimization in this setting.

Abstract

Understanding the generalization properties of neural networks on simple input-output distributions is key to explaining their performance on real datasets. The classical teacher-student setting, where a network is trained on data generated by a teacher model, provides a canonical theoretical test bed. In this context, a complete theoretical characterization of fully connected one-hidden-layer networks with generic activation functions remains missing. In this work, we develop a general framework for such networks with large width, yet much smaller than the input dimension. Using methods from statistical physics, we derive closed-form expressions for the typical performance of both finite-temperature (Bayesian) and empirical risk minimization estimators in terms of a small number of order parameters. We uncover a transition to a specialization phase, where hidden neurons align with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Neural Networks and Applications