Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To   Achieve Better Generalization

Kaiyue Wen; Zhiyuan Li; Tengyu Ma

arXiv:2307.11007·cs.LG·July 25, 2023·2 cites

Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization

Kaiyue Wen, Zhiyuan Li, Tengyu Ma

PDF

Open Access 1 Video

TL;DR

This paper critically examines the assumption that sharpness minimization leads to better generalization in neural networks, revealing complex relationships influenced by data and architecture, and suggesting the need for alternative explanations.

Contribution

It provides a theoretical and empirical analysis showing that sharpness minimization algorithms do not solely explain generalization in neural networks.

Findings

01

Flatness can imply generalization in some cases.

02

Sharpness minimization algorithms can fail to generalize despite reducing sharpness.

03

Non-generalizing flat models can still be produced by sharpness minimization algorithms.

Abstract

Despite extensive studies, the underlying reason as to why overparameterized neural networks can generalize remains elusive. Existing theory shows that common stochastic optimizers prefer flatter minimizers of the training loss, and thus a natural potential explanation is that flatness implies generalization. This work critically examines this explanation. Through theoretical and empirical investigation, we identify the following three scenarios for two-layer ReLU networks: (1) flatness provably implies generalization; (2) there exist non-generalizing flattest models and sharpness minimization algorithms fail to generalize, and (3) perhaps most surprisingly, there exist non-generalizing flattest models, but sharpness minimization algorithms still generalize. Our results suggest that the relationship between sharpness and generalization subtly depends on the data distributions and the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization· slideslive

Taxonomy

TopicsNeural Networks and Applications · Stochastic Gradient Optimization Techniques · Machine Learning and ELM