Learning Gradient-based Mixup with Extrapolation toward Flatter Minima for Domain Generalization

Danni Peng; Sinno Jialin Pan

arXiv:2209.14742·cs.LG·April 28, 2026

Learning Gradient-based Mixup with Extrapolation toward Flatter Minima for Domain Generalization

Danni Peng, Sinno Jialin Pan

PDF

TL;DR

This paper introduces FGMix, a gradient-based mixup method with extrapolation aimed at covering unseen data regions and finding flatter minima to improve domain generalization performance.

Contribution

The paper proposes a novel mixup policy that uses gradient compatibility to generate invariant features and encourages flatter minima for better unseen domain generalization.

Findings

01

FGMix outperforms existing DG methods on DomainBed benchmark.

02

Gradient-based mixup with extrapolation enhances coverage of unseen regions.

03

Flatter minima correlate with improved domain generalization.

Abstract

To address distribution shifts between training and test data, domain generalization (DG) leverages multiple source domains to learn a model that generalizes well to unseen domains. However, existing DG methods often overfit to the source domains, partly due to the limited coverage of the expected region in feature space. Motivated by this, we propose performing mixup with data interpolation and extrapolation to cover potentially unseen regions. To prevent the detrimental effects of unconstrained extrapolation, we carefully design a policy to generate the instance weights, named Flatness-aware Gradient-based Mixup (FGMix). The policy relies on gradient-based compatibilities to assign greater weights to instances that carry more invariant information and learn the mixup policy towards flatter minima for better generalization. On the DomainBed benchmark, we validate the efficacy of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.