Towards Efficient and Scalable Sharpness-Aware Minimization

Yong Liu; Siqi Mai; Xiangning Chen; Cho-Jui Hsieh; Yang You

arXiv:2203.02714·cs.LG·March 8, 2022·5 cites

Towards Efficient and Scalable Sharpness-Aware Minimization

Yong Liu, Siqi Mai, Xiangning Chen, Cho-Jui Hsieh, Yang You

PDF

Open Access 4 Repos

TL;DR

This paper introduces LookSAM, a more efficient variant of Sharpness-Aware Minimization that reduces computational costs while maintaining accuracy, enabling large-batch training of vision transformers from scratch in minutes.

Contribution

We propose LookSAM, a novel algorithm that periodically computes inner gradients in SAM, significantly reducing training overhead and enabling scalable large-batch training of vision transformers.

Findings

01

LookSAM achieves similar accuracy gains to SAM with much lower computational cost.

02

We successfully scale up batch size to 64k for training ViTs from scratch.

03

Training ViTs with LookSAM in minutes maintains competitive performance.

Abstract

Recently, Sharpness-Aware Minimization (SAM), which connects the geometry of the loss landscape and generalization, has demonstrated significant performance boosts on training large-scale models such as vision transformers. However, the update rule of SAM requires two sequential (non-parallelizable) gradient computations at each step, which can double the computational overhead. In this paper, we propose a novel algorithm LookSAM - that only periodically calculates the inner gradient ascent, to significantly reduce the additional training cost of SAM. The empirical results illustrate that LookSAM achieves similar accuracy gains to SAM while being tremendously faster - it enjoys comparable computational complexity with first-order optimizers such as SGD or Adam. To further evaluate the performance and scalability of LookSAM, we incorporate a layer-wise modification and perform…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Visual Attention and Saliency Detection · Advanced Memory and Neural Computing

MethodsStochastic Gradient Descent · Adam · Sharpness-Aware Minimization