Bridging Domain Generalization to Multimodal Domain Generalization via Unified Representations

Hai Huang; Yan Xia; Sashuai Zhou; Hanting Wang; Shulei Wang; Zhou Zhao

arXiv:2507.03304·cs.CV·July 8, 2025

Bridging Domain Generalization to Multimodal Domain Generalization via Unified Representations

Hai Huang, Yan Xia, Sashuai Zhou, Hanting Wang, Shulei Wang, Zhou Zhao

PDF

TL;DR

This paper introduces a unified representation approach for multi-modal domain generalization, effectively aligning different modalities to improve model robustness across unseen target domains in multi-modal tasks.

Contribution

The paper proposes a novel unified representation framework and a supervised disentanglement method to enhance multi-modal domain generalization, addressing limitations of existing single-modal DG techniques.

Findings

01

Outperforms existing methods on benchmark datasets like EPIC-Kitchens.

02

Effectively aligns multi-modal data within a unified space for better generalization.

03

Demonstrates robustness in unseen target domains across multiple modalities.

Abstract

Domain Generalization (DG) aims to enhance model robustness in unseen or distributionally shifted target domains through training exclusively on source domains. Although existing DG techniques, such as data manipulation, learning strategies, and representation learning, have shown significant progress, they predominantly address single-modal data. With the emergence of numerous multi-modal datasets and increasing demand for multi-modal tasks, a key challenge in Multi-modal Domain Generalization (MMDG) has emerged: enabling models trained on multi-modal sources to generalize to unseen target distributions within the same modality set. Due to the inherent differences between modalities, directly transferring methods from single-modal DG to MMDG typically yields sub-optimal results. These methods often exhibit randomness during generalization due to the invisibility of target domains and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.