Improving SAM Requires Rethinking its Optimization Formulation

Wanyun Xie; Fabian Latorre; Kimon Antonakopoulos; Thomas Pethick,; Volkan Cevher

arXiv:2407.12993·cs.LG·July 19, 2024

Improving SAM Requires Rethinking its Optimization Formulation

Wanyun Xie, Fabian Latorre, Kimon Antonakopoulos, Thomas Pethick,, Volkan Cevher

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper proposes a new formulation of Sharpness-Aware Minimization (SAM) as a bilevel optimization problem called BiSAM, using a 0-1 loss relaxation to improve perturbation strength and enhance model performance.

Contribution

It introduces BiSAM, a novel SAM variant reformulated as a bilevel optimization problem with a 0-1 loss surrogate, leading to stronger perturbations and better results.

Findings

01

BiSAM outperforms original SAM and variants in experiments.

02

BiSAM maintains similar computational complexity to SAM.

03

The code for BiSAM is publicly available.

Abstract

This paper rethinks Sharpness-Aware Minimization (SAM), which is originally formulated as a zero-sum game where the weights of a network and a bounded perturbation try to minimize/maximize, respectively, the same differentiable loss. To fundamentally improve this design, we argue that SAM should instead be reformulated using the 0-1 loss. As a continuous relaxation, we follow the simple conventional approach where the minimizing (maximizing) player uses an upper bound (lower bound) surrogate to the 0-1 loss. This leads to a novel formulation of SAM as a bilevel optimization problem, dubbed as BiSAM. BiSAM with newly designed lower-bound surrogate loss indeed constructs stronger perturbation. Through numerical evidence, we show that BiSAM consistently results in improved performance when compared to the original SAM and variants, while enjoying similar computational complexity. Our code…

Peer Reviews

Decision·ICML 2024 Poster

Reviewer 01Rating 10· strong accept, should be highlighted at the conferenceConfidence 4

Strengths

- The approach is simple, scalable, and theoretical-sound - The flow is easy to follow - The improvements are convincing and validated in many learning scenarios, including standard learning, fine-tuning and noisy-data learning

Weaknesses

- As mentioned in the conclusion, it will be great to see if BiSAM benefits other domains, e.g. NLP

Reviewer 02Rating 6· marginally above the acceptance thresholdConfidence 3

Strengths

- The idea of directly aiming to solve min-max of 0-1 loss and accordingly minimizing/maximizing different surrogates brings novelty. - The authors provide theoretically justified lower bound for practical implementation. They also provide a clear discussion on two different choices of surrogates. - The numerical results demonstrate that BiSAM improves accuracy.

Weaknesses

- The numerical results show limited improvements. Also, in some other works (Foret et al., 2021; Liu et al., 2022), SAM achieves accuracy higher than the accuracy of BiSAM in this paper (with the same model and number of epochs). Liu, Y., Mai, S., Cheng, M., Chen, X., Hsieh, C. J., & You, Y. (2022). Random sharpness-aware minimization. Advances in Neural Information Processing Systems, 35, 24543-24556.

Reviewer 03Rating 6· marginally above the acceptance thresholdConfidence 4

Strengths

The BiSAM method proposed in the paper somewhat resolves the issue of optimizing the 0-1 loss using gradients. This method has been validated across multiple datasets, demonstrating its advantages over SAM through extensive experiments.

Weaknesses

1. "The idea of BiSAM is very good, but its performance in experiments is only marginally better than SAM. The improvement over SAM is often within the range of error, making it hard to believe that it is an enhancement of SAM. 2. Can you explain why BiSAM using tanh as the lower bound has higher test accuracy on CIFAR-10 compared to using -log as the lower bound, but the results are the opposite on CIFAR-100? 3. Could you combine the characteristics of tanh and -log to create a new lower boun

Code & Models

Repositories

LIONS-EPFL/BiSAM
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFault Detection and Control Systems · Software Reliability and Analysis Research

MethodsSharpness-Aware Minimization · Segment Anything Model