Stability & Generalisation of Gradient Descent for Shallow Neural   Networks without the Neural Tangent Kernel

Dominic Richards; Ilja Kuzborskij

arXiv:2107.12723·stat.ML·November 10, 2021

Stability & Generalisation of Gradient Descent for Shallow Neural Networks without the Neural Tangent Kernel

Dominic Richards, Ilja Kuzborskij

PDF

Open Access 1 Video

TL;DR

This paper establishes new generalisation bounds for gradient descent training of overparameterised shallow neural networks, without relying on the neural tangent kernel, and demonstrates the effectiveness of early stopping in noisy regression settings.

Contribution

It provides the first direct analysis of GD for shallow networks without NTK assumptions, deriving tighter bounds and showing GD with early stopping is consistent under label noise.

Findings

01

Generalisation bounds controlled by shortest GD path network

02

Recovery of NTK-based risk bounds as a special case

03

GD with early stopping is consistent in noisy regression

Abstract

We revisit on-average algorithmic stability of GD for training overparameterised shallow neural networks and prove new generalisation and excess risk bounds without the NTK or PL assumptions. In particular, we show oracle type bounds which reveal that the generalisation and excess risk of GD is controlled by an interpolating network with the shortest GD path from initialisation (in a sense, an interpolating network with the smallest relative norm). While this was known for kernelised interpolants, our proof applies directly to networks trained by GD without intermediate kernelisation. At the same time, by relaxing oracle inequalities developed here we recover existing NTK-based risk bounds in a straightforward way, which demonstrates that our analysis is tighter. Finally, unlike most of the NTK-based analyses we focus on regression with label noise and show that GD with early stopping…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Stability & Generalisation of Gradient Descent for Shallow Neural Networks without the Neural Tangent Kernel· slideslive

Taxonomy

TopicsAdvanced Neural Network Applications · Machine Learning and ELM · Machine Learning and Data Classification

MethodsNeural Tangent Kernel · Early Stopping