Understanding SAM's Robustness to Noisy Labels through Gradient Down-weighting

Hoang-Chau Luong; Quang-Thuc Nguyen; Dat Ba Tran; and Minh-Triet Tran

arXiv:2411.17132·cs.LG·March 31, 2026

Understanding SAM's Robustness to Noisy Labels through Gradient Down-weighting

Hoang-Chau Luong, Quang-Thuc Nguyen, Dat Ba Tran, and Minh-Triet Tran

PDF

TL;DR

This paper analyzes how SAM's gradient amplification mechanism contributes to robustness against noisy labels and introduces SANER, a variant that enhances this effect, improving generalization in noisy label scenarios.

Contribution

The paper provides a new element-wise explanation for SAM's robustness and proposes SANER, a simple reweighting method that further reduces noisy label memorization.

Findings

01

SANER significantly reduces noisy-label memorization.

02

SANER improves generalization over SAM and SGD on noisy datasets.

03

SANER can be integrated into other SAM-like methods for enhanced robustness.

Abstract

Sharpness-Aware Minimization (SAM) was introduced to improve generalization by seeking flat minima, yet it also exhibits robustness to label noise, a phenomenon that remains only partially understood. Prior work has mainly attributed this effect to SAM's tendency to prolong the learning of clean samples. In this work, we provide a complementary explanation by analyzing SAM at the element-wise level. We show that when noisy gradients dominate a parameter direction, their influence is reduced by the stronger amplification of clean gradients. This slows the memorization of noisy labels while sustaining clean learning, offering a more complete account of SAM's robustness. Building on this insight, we propose SANER (Sharpness-Aware Noise-Explicit Reweighting), a simple variant of SAM that explicitly magnifies this down-weighting effect. Experiments on benchmark image classification tasks…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.