Are Flat Minima an Illusion?

Michael Timothy Bennett

arXiv:2605.05209·cs.LG·May 8, 2026

Are Flat Minima an Illusion?

Michael Timothy Bennett

PDF

TL;DR

This paper challenges the idea that flat minima cause better generalization, proposing instead that network weakness, defined by the volume of compatible functions, is the true predictor of generalization performance.

Contribution

The paper introduces the concept of weakness as a reparameterisation-invariant measure that better explains generalization than flatness, supported by theoretical proofs and empirical results.

Findings

01

Weakness correlates positively with generalization on MNIST.

02

Flatness and simplicity are dataset-dependent and less predictive.

03

Large-batch generalization advantage diminishes with more data.

Abstract

Neural networks that land in flat regions of the loss landscape tend to generalise better than those in sharp regions. Sharpness-Aware Minimisation exploits this to improve generalisation. But function-preserving reparameterisation can inflate the Hessian of any minimum by two orders of magnitude without changing a single prediction. If the geometry of weight space can be manufactured from nothing, it cannot be the cause of anything. In other words, flat is simple and simplicity depends on encoding. Here I show that the actual driver is weakness, the volume of completions compatible with the learned function in the learner's embodied language. Weakness is reparameterisation-invariant because it is defined over what the network \emph{does}, not how it is parameterised. I prove weakness is minimax-optimal under exchangeable demands, and that PAC-Bayes bounds work because they correlate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.