Optimisation & Generalisation in Networks of Neurons
Jeremy Bernstein

TL;DR
This thesis develops new theoretical foundations for optimization and generalization in neural networks, proposing architecture-dependent algorithms and a Bayesian perspective on network ensembles to improve understanding and transferability.
Contribution
It introduces a novel framework for architecture-dependent optimization algorithms and establishes a new link between network ensembles and individual networks for generalization analysis.
Findings
Optimization methods transfer hyperparameters across problems.
Large networks and margins relate to Bayesian 'Bayes point' models.
Provides a new perspective on regularization in overparameterized networks.
Abstract
The goal of this thesis is to develop the optimisation and generalisation theoretic foundations of learning in artificial neural networks. On optimisation, a new theoretical framework is proposed for deriving architecture-dependent first-order optimisation algorithms. The approach works by combining a "functional majorisation" of the loss function with "architectural perturbation bounds" that encode an explicit dependence on neural architecture. The framework yields optimisation methods that transfer hyperparameters across learning problems. On generalisation, a new correspondence is proposed between ensembles of networks and individual networks. It is argued that, as network width and normalised margin are taken large, the space of networks that interpolate a particular training set concentrates on an aggregated Bayesian method known as a "Bayes point machine". This correspondence…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Gaussian Processes and Bayesian Inference · Machine Learning and Algorithms
