CrossFlowDG: Bridging the Modality Gap with Cross-modal Flow Matching for Domain Generalization

Antonios Kritikos; Nikolaos Spanos; Athanasios Voulodimos

arXiv:2604.16892·cs.CV·April 21, 2026

CrossFlowDG: Bridging the Modality Gap with Cross-modal Flow Matching for Domain Generalization

Antonios Kritikos, Nikolaos Spanos, Athanasios Voulodimos

PDF

1 Repo

TL;DR

CrossFlowDG introduces a novel cross-modal flow matching approach to reduce modality gaps in domain generalization, improving model robustness across diverse visual domains.

Contribution

It proposes a new framework that explicitly transports image embeddings towards text embeddings in joint space, addressing modality gaps in multimodal domain generalization.

Findings

01

Achieves competitive performance on four DG benchmarks.

02

State-of-the-art results on TerraIncognita.

03

Effective cross-modal flow matching improves domain invariance.

Abstract

Domain generalization (DG) aims to maintain performance under domain shift, which in computer vision appears primarily as stylistic variations that cause models to overfit to domain-specific appearance cues rather than class semantics. To overcome this, recent methods use textual representations as stable, domain-invariant anchors. However, multimodal approaches that rely on cosine similarity-based contrastive alignment leave a modality gap where image and text embeddings remain geometrically separated despite semantic correspondence. We propose CrossFlowDG, a novel DG framework that addresses this residual gap using noise-free, cross-modal flow matching. By learning a continuous transformation in the joint Euclidean latent space, our framework explicitly transports domain-biased image embeddings toward domain-invariant text embeddings of the correct class. Using the efficient VMamba…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ajkrit/CrossFlowDG
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.