Sharp Minima Can Generalize: A Loss Landscape Perspective On Data
Raymond Fan, Bryce Sandlund, Lin Myat Ko

TL;DR
This paper challenges the volume hypothesis by showing that sharp minima can also generalize well, especially with large datasets, due to changes in the loss landscape that affect minima volumes.
Contribution
It demonstrates that sharp minima can generalize effectively and explains how increasing data alters the loss landscape to favor such minima.
Findings
Sharp minima can generalize well with large datasets.
Increasing data changes the loss landscape, enlarging minima volumes.
The volume hypothesis does not fully explain generalization in deep learning.
Abstract
The volume hypothesis suggests deep learning is effective because it is likely to find flat minima due to their large volumes, and flat minima generalize well. This picture does not explain the role of large datasets in generalization. Measuring minima volumes under varying amounts of training data reveals sharp minima which generalize well exist, but are unlikely to be found due to their small volumes. Increasing data changes the loss landscape, such that previously small generalizing minima become (relatively) large.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Generative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications
