Sharp Minima Can Generalize: A Loss Landscape Perspective On Data

Raymond Fan; Bryce Sandlund; Lin Myat Ko

arXiv:2511.04808·cs.LG·November 10, 2025

Sharp Minima Can Generalize: A Loss Landscape Perspective On Data

Raymond Fan, Bryce Sandlund, Lin Myat Ko

PDF

Open Access

TL;DR

This paper challenges the volume hypothesis by showing that sharp minima can also generalize well, especially with large datasets, due to changes in the loss landscape that affect minima volumes.

Contribution

It demonstrates that sharp minima can generalize effectively and explains how increasing data alters the loss landscape to favor such minima.

Findings

01

Sharp minima can generalize well with large datasets.

02

Increasing data changes the loss landscape, enlarging minima volumes.

03

The volume hypothesis does not fully explain generalization in deep learning.

Abstract

The volume hypothesis suggests deep learning is effective because it is likely to find flat minima due to their large volumes, and flat minima generalize well. This picture does not explain the role of large datasets in generalization. Measuring minima volumes under varying amounts of training data reveals sharp minima which generalize well exist, but are unlikely to be found due to their small volumes. Increasing data changes the loss landscape, such that previously small generalizing minima become (relatively) large.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Generative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications