Universal mean field upper bound for the generalisation gap of deep neural networks
S. Ariosto, R. Pacelli, F. Ginelli, M. Gherardi, P. Rotondo

TL;DR
This paper uses replica mean field theory to derive a universal upper bound on the generalisation gap of deep neural networks, showing it diminishes faster than existing bounds as dataset size grows, across various architectures.
Contribution
It introduces a novel theoretical framework combining replica mean field theory and statistical learning theory to bound the generalisation gap of DNNs, including last-layer optimization.
Findings
Generalisation gap approaches zero faster than 2 N_out / P for large datasets.
The bounds outperform existing statistical learning theory bounds.
Predictions validated on diverse neural network architectures.
Abstract
Modern deep neural networks (DNNs) represent a formidable challenge for theorists: according to the commonly accepted probabilistic framework that describes their performance, these architectures should overfit due to the huge number of parameters to train, but in practice they do not. Here we employ results from replica mean field theory to compute the generalisation gap of machine learning models with quenched features, in the teacher-student scenario and for regression problems with quadratic loss function. Notably, this framework includes the case of DNNs where the last layer is optimised given a specific realisation of the remaining weights. We show how these results -- combined with ideas from statistical learning theory -- provide a stringent asymptotic upper bound on the generalisation gap of fully trained DNN as a function of the size of the dataset . In particular, in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Gaussian Processes and Bayesian Inference · Machine Learning and Algorithms
