Non-Vacuous Generalisation Bounds for Shallow Neural Networks

Felix Biggs; Benjamin Guedj

arXiv:2202.01627·cs.LG·October 21, 2022·1 cites

Non-Vacuous Generalisation Bounds for Shallow Neural Networks

Felix Biggs, Benjamin Guedj

PDF

Open Access 1 Repo

TL;DR

This paper derives new, non-vacuous generalisation bounds for shallow neural networks with specific activations and normalized data, applicable to deterministic parameters and validated on standard datasets.

Contribution

It introduces PAC-Bayesian generalisation bounds for shallow neural networks with deterministic parameters, applicable to specific activation functions and normalized data.

Findings

01

Bounds are empirically non-vacuous on MNIST and Fashion-MNIST.

02

Applicable to networks trained with vanilla stochastic gradient descent.

03

Focuses on networks with sigmoid-shaped Gaussian error or GELU activation.

Abstract

We focus on a specific class of shallow neural networks with a single hidden layer, namely those with $L_{2}$ -normalised data and either a sigmoid-shaped Gaussian error function ("erf") activation or a Gaussian Error Linear Unit (GELU) activation. For these networks, we derive new generalisation bounds through the PAC-Bayesian theory; unlike most existing such bounds they apply to neural networks with deterministic rather than randomised parameters. Our bounds are empirically non-vacuous when the network is trained with vanilla stochastic gradient descent on MNIST and Fashion-MNIST.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

biggs/shallow-nets
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Gaussian Processes and Bayesian Inference