Unveiling the structure of wide flat minima in neural networks
Carlo Baldassi, Clarissa Lauditi, Enrico M. Malatesta, Gabriele, Perugini, Riccardo Zecchina

TL;DR
This paper investigates the structure of wide flat minima in neural networks, revealing their formation from high-margin solutions and providing insights into when flat minima emerge as model size increases.
Contribution
It introduces a new analytical perspective on the formation of wide flat minima and their relation to high-margin solutions in neural networks.
Findings
Wide flat minima form from high-margin configurations.
High-margin minima are exponentially rare but concentrated in specific regions.
The analysis offers a method to estimate the emergence of flat minima based on model size.
Abstract
The success of deep learning has revealed the application potential of neural networks across the sciences and opened up fundamental theoretical problems. In particular, the fact that learning algorithms based on simple variants of gradient methods are able to find near-optimal minima of highly nonconvex loss functions is an unexpected feature of neural networks. Moreover, such algorithms are able to fit the data even in the presence of noise, and yet they have excellent predictive capabilities. Several empirical results have shown a reproducible correlation between the so-called flatness of the minima achieved by the algorithms and the generalization performance. At the same time, statistical physics results have shown that in nonconvex networks a multitude of narrow minima may coexist with a much smaller number of wide flat minima, which generalize well. Here we show that wide flat…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Neural Networks and Applications · Machine Learning and ELM
