Preconditioned Sharpness-Aware Minimization: Unifying Analysis and a   Novel Learning Algorithm

Yilang Zhang; Bingcong Li; Georgios B. Giannakis

arXiv:2501.06603·cs.LG·January 14, 2025

Preconditioned Sharpness-Aware Minimization: Unifying Analysis and a Novel Learning Algorithm

Yilang Zhang, Bingcong Li, Georgios B. Giannakis

PDF

TL;DR

This paper unifies various sharpness-aware minimization methods using preconditioning, provides convergence analysis, and introduces infoSAM, a new algorithm that improves robustness and generalization in deep learning.

Contribution

It offers a unifying framework for SAM variants through preconditioning and proposes infoSAM, a novel algorithm addressing adversarial degradation issues.

Findings

01

infoSAM outperforms existing SAM variants on multiple benchmarks

02

Theoretical analysis confirms convergence properties of the unified approach

03

Preconditioning enhances the effectiveness of sharpness-aware optimization methods

Abstract

Targeting solutions over `flat' regions of the loss landscape, sharpness-aware minimization (SAM) has emerged as a powerful tool to improve generalizability of deep neural network based learning. While several SAM variants have been developed to this end, a unifying approach that also guides principled algorithm design has been elusive. This contribution leverages preconditioning (pre) to unify SAM variants and provide not only unifying convergence analysis, but also valuable insights. Building upon preSAM, a novel algorithm termed infoSAM is introduced to address the so-called adversarial model degradation issue in SAM by adjusting gradients depending on noise estimates. Extensive numerical tests demonstrate the superiority of infoSAM across various benchmarks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSharpness-Aware Minimization · Segment Anything Model