Unveiling the structure of wide flat minima in neural networks

Carlo Baldassi; Clarissa Lauditi; Enrico M. Malatesta; Gabriele; Perugini; Riccardo Zecchina

arXiv:2107.01163·cond-mat.dis-nn·February 15, 2022

Unveiling the structure of wide flat minima in neural networks

Carlo Baldassi, Clarissa Lauditi, Enrico M. Malatesta, Gabriele, Perugini, Riccardo Zecchina

PDF

Open Access

TL;DR

This paper investigates the structure of wide flat minima in neural networks, revealing their formation from high-margin solutions and providing insights into when flat minima emerge as model size increases.

Contribution

It introduces a new analytical perspective on the formation of wide flat minima and their relation to high-margin solutions in neural networks.

Findings

01

Wide flat minima form from high-margin configurations.

02

High-margin minima are exponentially rare but concentrated in specific regions.

03

The analysis offers a method to estimate the emergence of flat minima based on model size.

Abstract

The success of deep learning has revealed the application potential of neural networks across the sciences and opened up fundamental theoretical problems. In particular, the fact that learning algorithms based on simple variants of gradient methods are able to find near-optimal minima of highly nonconvex loss functions is an unexpected feature of neural networks. Moreover, such algorithms are able to fit the data even in the presence of noise, and yet they have excellent predictive capabilities. Several empirical results have shown a reproducible correlation between the so-called flatness of the minima achieved by the algorithms and the generalization performance. At the same time, statistical physics results have shown that in nonconvex networks a multitude of narrow minima may coexist with a much smaller number of wide flat minima, which generalize well. Here we show that wide flat…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Neural Networks and Applications · Machine Learning and ELM