Learning curves of generic features maps for realistic datasets with a teacher-student model
Bruno Loureiro, C\'edric Gerbelot, Hugo Cui, Sebastian Goldt, Florent, Krzakala, Marc M\'ezard, Lenka Zdeborov\'a

TL;DR
This paper extends the teacher-student model to include generic feature maps, enabling the analysis of learning curves for realistic datasets and bridging the gap between theory and practical data scenarios.
Contribution
We introduce a Gaussian covariate generalization of the teacher-student model that captures realistic data behaviors and derive closed-form formulas for training loss and generalization error.
Findings
The model accurately predicts learning curves for kernel regression and classification.
It applies to feature maps like random projections, scattering transforms, and neural network features.
The framework reveals both its strengths and limitations in modeling real-world data.
Abstract
Teacher-student models provide a framework in which the typical-case performance of high-dimensional supervised learning can be described in closed form. The assumptions of Gaussian i.i.d. input data underlying the canonical teacher-student model may, however, be perceived as too restrictive to capture the behaviour of realistic data sets. In this paper, we introduce a Gaussian covariate generalisation of the model where the teacher and student can act on different spaces, generated with fixed, but generic feature maps. While still solvable in a closed form, this generalization is able to capture the learning curves for a broad range of realistic data sets, thus redeeming the potential of the teacher-student framework. Our contribution is then two-fold: First, we prove a rigorous formula for the asymptotic training loss and generalisation error. Second, we present a number of situations…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsGaussian Processes and Bayesian Inference · Machine Learning and Data Classification · Neural Networks and Applications
