Towards Understanding The Calibration Benefits of Sharpness-Aware Minimization

Chengli Tan; Yubo Zhou; Haishan Ye; Guang Dai; Junmin Liu; Zengjie Song; Jiangshe Zhang; Zixiang Zhao; Yunda Hao; Yong Xu

arXiv:2505.23866·cs.LG·June 2, 2025

Towards Understanding The Calibration Benefits of Sharpness-Aware Minimization

Chengli Tan, Yubo Zhou, Haishan Ye, Guang Dai, Junmin Liu, Zengjie Song, Jiangshe Zhang, Zixiang Zhao, Yunda Hao, Yong Xu

PDF

Open Access 3 Reviews

TL;DR

This paper investigates how sharpness-aware minimization (SAM) improves neural network calibration, proposing a variant called CSAM that further reduces calibration error, with extensive experiments confirming its effectiveness.

Contribution

The paper provides a theoretical analysis linking SAM to improved calibration and introduces CSAM, a new method that enhances calibration performance over existing approaches.

Findings

01

SAM reduces calibration error in neural networks.

02

CSAM outperforms SAM and other methods in calibration accuracy.

03

Extensive experiments on ImageNet-1K validate the proposed methods.

Abstract

Deep neural networks have been increasingly used in safety-critical applications such as medical diagnosis and autonomous driving. However, many studies suggest that they are prone to being poorly calibrated and have a propensity for overconfidence, which may have disastrous consequences. In this paper, unlike standard training such as stochastic gradient descent, we show that the recently proposed sharpness-aware minimization (SAM) counteracts this tendency towards overconfidence. The theoretical analysis suggests that SAM allows us to learn models that are already well-calibrated by implicitly maximizing the entropy of the predictive distribution. Inspired by this finding, we further propose a variant of SAM, coined as CSAM, to ameliorate model calibration. Extensive experiments on various datasets, including ImageNet-1K, demonstrate the benefits of SAM in reducing calibration error.…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 4Confidence 2

Strengths

## Strengths * This paper provides a simple and practical variant. CSAM requires only a minor change and no additional computation, yet consistently improves ECE. * The experiments are broad, covering both in- and out-of-distribution evaluations across convolutional and transformer-based models. * The finding that SAM alone can outperform post-hoc calibration methods like temperature scaling is practically valuable for reliability-sensitive applications. * The paper gives a clear mathemat

Weaknesses

## Weaknesses * The empirical effect has been reported before. The observation that SAM improves calibration is not new—Zheng et al. (2021) showed similar improvements in long-tailed recognition, and the original SAM paper (Foret et al., 2021) also mentioned better reliability qualitatively. * The theoretical assumptions are strong. The derivations rely on smoothness and bounded-Hessian assumptions, which may not fully hold for complex networks, limiting the generality of the proofs. * While CS

Reviewer 02Rating 8Confidence 4

Strengths

(1) The paper’s key strength lies in its originality and significance: it provides the first formal theoretical explanation for why Sharpness-Aware Minimization (SAM) improves calibration—linking it to implicit entropy maximization—offering a principled understanding beyond empirical observation. This insight bridges optimization geometry and uncertainty quantification, a valuable contribution to both communities. (2) The proposed CSAM variant is a high-quality and practical extension that con

Weaknesses

(1) Missing summation symbol in the ECE estimator (Lines 177–178). The definition of bin accuracy in the Expected Calibration Error (ECE) computation omits the summation over samples in bin. I think the correct expression should be $\text{acc}(B_i) = \frac{1}{|B_i|} \sum_{z_j \in B_i} \mathbb{I}[y_j = \arg \max f_\theta(x_j)]$. (2) The theoretical analysis (Lemma 1 and Theorem 1) relies on an assumption about the lower bound of the smallest Hessian eigenvalue along the linear interpolation bet

Reviewer 03Rating 4Confidence 4

Strengths

1. Both SAM and model calibration represent important research directions, and exploring their mutual influence holds significant scientific value. 2. The study demonstrates strong completeness by beginning with theoretical foundations to explain their respective roles and subsequently extending the analysis to methodological applications.

Weaknesses

1. Although Theorems 1, 2, and 3 are presented, they largely reflect variations of similar concepts. Consequently, the theoretical framework appears somewhat repetitive and lacks sufficient depth. 2. The methodological contribution is not particularly innovative, as it mainly combines the existing SAM framework with focal loss. Thus, the degree of novelty—especially from a methodological perspective—may be limited in its ability to inspire readers.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Industrial Vision Systems and Defect Detection · Image and Video Quality Assessment

MethodsSharpness-Aware Minimization · Segment Anything Model