Towards Multimodal Domain Generalization with Few Labels
Hongzhao Li, Hao Dong, Hualei Wan, Shupan Li, Mingliang Xu, Muhammad Haris Khan

TL;DR
This paper introduces a new problem setting called Semi-Supervised Multimodal Domain Generalization (SSMDG), proposing a unified framework with three components to improve multimodal model robustness and generalization with limited labeled data across domains and modalities.
Contribution
The paper defines SSMDG, develops a novel framework with consensus-driven regularization, disagreement-aware regularization, and cross-modal alignment, and establishes the first benchmarks for this problem.
Findings
Our method outperforms strong baselines on SSMDG benchmarks.
The framework effectively handles missing modalities.
It improves generalization across unseen domains with limited labels.
Abstract
Multimodal models ideally should generalize to unseen domains while remaining data-efficient to reduce annotation costs. To this end, we introduce and study a new problem, Semi-Supervised Multimodal Domain Generalization (SSMDG), which aims to learn robust multimodal models from multi-source data with few labeled samples. We observe that existing approaches fail to address this setting effectively: multimodal domain generalization methods cannot exploit unlabeled data, semi-supervised multimodal learning methods ignore domain shifts, and semi-supervised domain generalization methods are confined to single-modality inputs. To overcome these limitations, we propose a unified framework featuring three key components: Consensus-Driven Consistency Regularization, which obtains reliable pseudo-labels through confident fused-unimodal consensus; Disagreement-Aware Regularization, which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Topic Modeling · Multimodal Machine Learning Applications
