Generalisation and the Risk--Entropy Curve
Dominic Belcher, Antonia Marcu, Adam Pr\"ugel-Bennett

TL;DR
This paper introduces the concept of risk entropy to explain how the distribution of risks influences the generalisation performance of deep neural networks, emphasizing the importance of data distribution over model capacity.
Contribution
It defines risk entropy as a key factor in generalisation, shows how to empirically infer it for deep networks, and links its behavior to practical generalisation performance.
Findings
Risk entropy can be empirically estimated using MCMC techniques.
Generalisation performance depends on risk entropy distribution, not just model capacity.
Behavior of risk entropy before asymptotic regime influences practical generalisation.
Abstract
In this paper we show that the expected generalisation performance of a learning machine is determined by the distribution of risks or equivalently its logarithm -- a quantity we term the risk entropy -- and the fluctuations in a quantity we call the training ratio. We show that the risk entropy can be empirically inferred for deep neural network models using Markov Chain Monte Carlo techniques. Results are presented for different deep neural networks on a variety of problems. The asymptotic behaviour of the risk entropy acts in an analogous way to the capacity of the learning machine, but the generalisation performance experienced in practical situations is determined by the behaviour of the risk entropy before the asymptotic regime is reached. This performance is strongly dependent on the distribution of the data (features and targets) and not just on the capacity of the learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
