FAM: Relative Flatness Aware Minimization
Linara Adilova, Amr Abourayya, Jianning Li, Amin Dada, Henning Petzka,, Jan Egger, Jens Kleesiek, Michael Kamp

TL;DR
This paper introduces FAM, a new regularizer based on relative flatness that improves neural network generalization by efficiently leveraging Hessian information of a single layer, addressing theoretical and practical limitations of previous flatness measures.
Contribution
It proposes a theoretically grounded, computationally efficient flatness regularizer that works with arbitrary loss functions and large neural networks, improving generalization.
Findings
FAM improves generalization across various models and tasks.
The method requires only Hessian computation of a single layer.
FAM outperforms traditional flatness-based approaches in empirical evaluations.
Abstract
Flatness of the loss curve around a model at hand has been shown to empirically correlate with its generalization ability. Optimizing for flatness has been proposed as early as 1994 by Hochreiter and Schmidthuber, and was followed by more recent successful sharpness-aware optimization techniques. Their widespread adoption in practice, though, is dubious because of the lack of theoretically grounded connection between flatness and generalization, in particular in light of the reparameterization curse - certain reparameterizations of a neural network change most flatness measures but do not change generalization. Recent theoretical work suggests that a particular relative flatness measure can be connected to generalization and solves the reparameterization curse. In this paper, we derive a regularizer based on this relative flatness that is easy to compute, fast, efficient, and works with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Neural Networks and Applications · Ferroelectric and Negative Capacitance Devices
MethodsAttentive Walk-Aggregating Graph Neural Network
