Normalization Layers Are All That Sharpness-Aware Minimization Needs

Maximilian Mueller; Tiffany Vlaar; David Rolnick; Matthias Hein

arXiv:2306.04226·cs.LG·November 20, 2023·2 cites

Normalization Layers Are All That Sharpness-Aware Minimization Needs

Maximilian Mueller, Tiffany Vlaar, David Rolnick, Matthias Hein

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper demonstrates that perturbing only normalization layer parameters in sharpness-aware minimization (SAM) can outperform perturbing all parameters, suggesting normalization layers are key to SAM's effectiveness in improving generalization.

Contribution

The study reveals that focusing perturbations on normalization layers alone is sufficient for SAM's success, challenging the belief that reduced sharpness is the sole factor.

Findings

01

Perturbing only normalization parameters in SAM outperforms full-parameter perturbation.

02

Alternative sparse perturbations do not match normalization layer effectiveness.

03

Normalization layers are uniquely influential in SAM's performance.

Abstract

Sharpness-aware minimization (SAM) was proposed to reduce sharpness of minima and has been shown to enhance generalization performance in various settings. In this work we show that perturbing only the affine normalization parameters (typically comprising 0.1% of the total parameters) in the adversarial step of SAM can outperform perturbing all of the parameters.This finding generalizes to different SAM variants and both ResNet (Batch Normalization) and Vision Transformer (Layer Normalization) architectures. We consider alternative sparse perturbation approaches and find that these do not achieve similar performance enhancement at such extreme sparsity levels, showing that this behaviour is unique to the normalization layers. Although our findings reaffirm the effectiveness of SAM in improving generalization performance, they cast doubt on whether this is solely caused by reduced…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mueller-mp/sam-on
pytorchOfficial

Videos

Normalization Layers Are All That Sharpness-Aware Minimization Needs· slideslive

Taxonomy

TopicsImage Enhancement Techniques · Advanced Image and Video Retrieval Techniques · Visual Attention and Saliency Detection

MethodsAttention Is All You Need · Average Pooling · Convolution · Global Average Pooling · Dense Connections · Position-Wise Feed-Forward Layer · Max Pooling · Label Smoothing · Kaiming Initialization · Segment Anything Model