Bridging Domain Generalization to Multimodal Domain Generalization via Unified Representations
Hai Huang, Yan Xia, Sashuai Zhou, Hanting Wang, Shulei Wang, Zhou Zhao

TL;DR
This paper introduces a unified representation approach for multi-modal domain generalization, effectively aligning different modalities to improve model robustness across unseen target domains in multi-modal tasks.
Contribution
The paper proposes a novel unified representation framework and a supervised disentanglement method to enhance multi-modal domain generalization, addressing limitations of existing single-modal DG techniques.
Findings
Outperforms existing methods on benchmark datasets like EPIC-Kitchens.
Effectively aligns multi-modal data within a unified space for better generalization.
Demonstrates robustness in unseen target domains across multiple modalities.
Abstract
Domain Generalization (DG) aims to enhance model robustness in unseen or distributionally shifted target domains through training exclusively on source domains. Although existing DG techniques, such as data manipulation, learning strategies, and representation learning, have shown significant progress, they predominantly address single-modal data. With the emergence of numerous multi-modal datasets and increasing demand for multi-modal tasks, a key challenge in Multi-modal Domain Generalization (MMDG) has emerged: enabling models trained on multi-modal sources to generalize to unseen target distributions within the same modality set. Due to the inherent differences between modalities, directly transferring methods from single-modal DG to MMDG typically yields sub-optimal results. These methods often exhibit randomness during generalization due to the invisibility of target domains and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
