
TL;DR
This paper challenges the idea that flat minima cause better generalization, proposing instead that network weakness, defined by the volume of compatible functions, is the true predictor of generalization performance.
Contribution
The paper introduces the concept of weakness as a reparameterisation-invariant measure that better explains generalization than flatness, supported by theoretical proofs and empirical results.
Findings
Weakness correlates positively with generalization on MNIST.
Flatness and simplicity are dataset-dependent and less predictive.
Large-batch generalization advantage diminishes with more data.
Abstract
Neural networks that land in flat regions of the loss landscape tend to generalise better than those in sharp regions. Sharpness-Aware Minimisation exploits this to improve generalisation. But function-preserving reparameterisation can inflate the Hessian of any minimum by two orders of magnitude without changing a single prediction. If the geometry of weight space can be manufactured from nothing, it cannot be the cause of anything. In other words, flat is simple and simplicity depends on encoding. Here I show that the actual driver is weakness, the volume of completions compatible with the learned function in the learner's embodied language. Weakness is reparameterisation-invariant because it is defined over what the network \emph{does}, not how it is parameterised. I prove weakness is minimax-optimal under exchangeable demands, and that PAC-Bayes bounds work because they correlate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
