Non-Vacuous Generalisation Bounds for Shallow Neural Networks
Felix Biggs, Benjamin Guedj

TL;DR
This paper derives new, non-vacuous generalisation bounds for shallow neural networks with specific activations and normalized data, applicable to deterministic parameters and validated on standard datasets.
Contribution
It introduces PAC-Bayesian generalisation bounds for shallow neural networks with deterministic parameters, applicable to specific activation functions and normalized data.
Findings
Bounds are empirically non-vacuous on MNIST and Fashion-MNIST.
Applicable to networks trained with vanilla stochastic gradient descent.
Focuses on networks with sigmoid-shaped Gaussian error or GELU activation.
Abstract
We focus on a specific class of shallow neural networks with a single hidden layer, namely those with -normalised data and either a sigmoid-shaped Gaussian error function ("erf") activation or a Gaussian Error Linear Unit (GELU) activation. For these networks, we derive new generalisation bounds through the PAC-Bayesian theory; unlike most existing such bounds they apply to neural networks with deterministic rather than randomised parameters. Our bounds are empirically non-vacuous when the network is trained with vanilla stochastic gradient descent on MNIST and Fashion-MNIST.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Gaussian Processes and Bayesian Inference
