Universal mean field upper bound for the generalisation gap of deep   neural networks

S. Ariosto; R. Pacelli; F. Ginelli; M. Gherardi; P. Rotondo

arXiv:2201.11022·cond-mat.dis-nn·March 3, 2022

Universal mean field upper bound for the generalisation gap of deep neural networks

S. Ariosto, R. Pacelli, F. Ginelli, M. Gherardi, P. Rotondo

PDF

Open Access 1 Datasets

TL;DR

This paper uses replica mean field theory to derive a universal upper bound on the generalisation gap of deep neural networks, showing it diminishes faster than existing bounds as dataset size grows, across various architectures.

Contribution

It introduces a novel theoretical framework combining replica mean field theory and statistical learning theory to bound the generalisation gap of DNNs, including last-layer optimization.

Findings

01

Generalisation gap approaches zero faster than 2 N_out / P for large datasets.

02

The bounds outperform existing statistical learning theory bounds.

03

Predictions validated on diverse neural network architectures.

Abstract

Modern deep neural networks (DNNs) represent a formidable challenge for theorists: according to the commonly accepted probabilistic framework that describes their performance, these architectures should overfit due to the huge number of parameters to train, but in practice they do not. Here we employ results from replica mean field theory to compute the generalisation gap of machine learning models with quenched features, in the teacher-student scenario and for regression problems with quadratic loss function. Notably, this framework includes the case of DNNs where the last layer is optimised given a specific realisation of the remaining weights. We show how these results -- combined with ideas from statistical learning theory -- provide a stringent asymptotic upper bound on the generalisation gap of fully trained DNN as a function of the size of the dataset $P$ . In particular, in the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Kylan12/Synthetic-AI-ML-Dataset
dataset· 42 dl
42 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Gaussian Processes and Bayesian Inference · Machine Learning and Algorithms