Flatness After All?
Neta Shoham, Liron Mor-Yosef, Haim Avron

TL;DR
This paper proposes a new flatness measure based on soft rank of the Hessian to better predict generalization gaps in neural networks, especially when traditional sharpness measures are unreliable.
Contribution
It introduces a soft rank flatness measure that accurately estimates generalization gaps in calibrated models and remains reliable for non-calibrated models, connecting to established criteria.
Findings
The soft rank measure correlates well with the true generalization gap.
It outperforms traditional sharpness measures in robustness.
Experimental results validate the effectiveness of the proposed method.
Abstract
Recent literature generalization in deep learning has examined the relationship between the curvature of the loss function at minima and generalization, mainly in the context of overparameterized neural networks. A key observation is that "flat" minima tend to generalize better than "sharp" minima. While this idea is supported by empirical evidence, it has also been shown that deep networks can generalize even with arbitrary sharpness, as measured by either the trace or the spectral norm of the Hessian. In this paper, we argue that generalization could be assessed by measuring flatness using a soft rank measure of the Hessian. We show that when an exponential family neural network model is exactly calibrated, and its prediction error and its confidence on the prediction are not correlated with the first and the second derivative of the network's output, our measure accurately captures…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
