Domain-Inspired Sharpness-Aware Minimization Under Domain Shifts

Ruipeng Zhang; Ziqing Fan; Jiangchao Yao; Ya Zhang; Yanfeng Wang

arXiv:2405.18861·cs.CV·May 30, 2024·2 cites

Domain-Inspired Sharpness-Aware Minimization Under Domain Shifts

Ruipeng Zhang, Ziqing Fan, Jiangchao Yao, Ya Zhang, Yanfeng Wang

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper introduces DISAM, a novel optimization algorithm that improves convergence and generalization under domain shifts by balancing domain-level convergence during sharpness-aware minimization.

Contribution

DISAM is the first method to incorporate domain-level convergence consistency into sharpness-aware minimization, enhancing performance under domain shifts.

Findings

01

DISAM achieves faster convergence and better generalization on domain generalization benchmarks.

02

DISAM outperforms state-of-the-art methods in various domain shift scenarios.

03

DISAM is more efficient in parameter-efficient fine-tuning with pretraining models.

Abstract

This paper presents a Domain-Inspired Sharpness-Aware Minimization (DISAM) algorithm for optimization under domain shifts. It is motivated by the inconsistent convergence degree of SAM across different domains, which induces optimization bias towards certain domains and thus impairs the overall convergence. To address this issue, we consider the domain-level convergence consistency in the sharpness estimation to prevent the overwhelming (deficient) perturbations for less (well) optimized domains. Specifically, DISAM introduces the constraint of minimizing variance in the domain loss, which allows the elastic gradient calibration in perturbation generation: when one domain is optimized above the averaging level \textit{w.r.t.} loss, the gradient perturbation towards that domain will be weakened automatically, and vice versa. Under this mechanism, we theoretically show that DISAM can…

Peer Reviews

Decision·ICLR 2024 poster

Reviewer 01Rating 5· marginally below the acceptance thresholdConfidence 3

Strengths

They identify that the use of SAM has a detrimental impact on training under domain shifts, and further analyze that the reason is the inconsistent convergence of training domains that deviates from the underlying i.i.d assumption of SAM.

Weaknesses

This paper considers the domain-level convergence consistency in SAM for multiple domains, and proposes to adopts the domain loss variance in training loss. The convergence consistency is a general issue, and the solution is normal, thus the novelty is not so clear for publication in ICLR.

Reviewer 02Rating 8· accept, good paperConfidence 4

Strengths

1. The proposed method targets at the model generalization under domain shifts, which is a common challenge in machine learning. To date, there has been a lack of thorough investigation into sharpness-based optimization in the context of domain shifts, and the idea of constraint the variance of losses among training domains is interesting. 2. The paper not only presents theoretical evidence showcasing the efficiency of DISAM, but it also provides empirical data to support this claim, demonstrat

Weaknesses

1. SAM-based optimization incurs twice the computational overhead and additional storage overhead in comparison to the commonly used SGD. While DISAM, the method proposed in this paper, demonstrates faster convergence under domain shift conditions when compared to SAM, it does not include a comparison with optimizers such as SGD or Adam. 2. This paper employs multiple benchmarks to evaluate the performance of multi-source domain generalization. The article highlights the need for advancements i

Reviewer 03Rating 6· marginally above the acceptance thresholdConfidence 5

Strengths

As of now, there has not yet been a sharpness-aware minimization (SAM) methodology developed specifically for addressing distribution shifts. The issue of varying convergence rates across different domains, as observed in SAM, is undeniably a significant challenge. This methodology presents an impressive degree of compatibility, as it can be integrated with a variety of sharpness-variants. An especially commendable aspect of this approach is its computational efficiency. Compared to standard SA

Weaknesses

The idea of minimizing the variance between losses, a core aspect of the presented methodology, is not entirely novel. Similar concepts have been previously explored in methods like vREX (Out-of-Distribution Generalization via Risk Extrapolation) and further extended to gradient computations in methodologies like Fishr (Invariant Gradient Variances for Out-of-Distribution Generalization). In this context, the proposed approach appears to be an incremental adaptation of vREX principles applied sp

Code & Models

Repositories

mediabrain-sjtu/disam
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Computing and Algorithms · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques

MethodsSharpness-Aware Minimization · Segment Anything Model