TL;DR
This paper introduces MMDG-Bench, a comprehensive benchmark for Multimodal Domain Generalization, revealing that recent methods show limited improvements and highlighting persistent challenges like robustness and missing modalities.
Contribution
The paper presents the first standardized benchmark for MMDG, evaluating multiple datasets, methods, and settings to provide a reliable assessment of progress and challenges.
Findings
Recent MMDG methods offer marginal gains over ERM baseline.
No single method outperforms others consistently across datasets.
Significant robustness issues remain under corruption and missing modalities.
Abstract
Despite the growing popularity of Multimodal Domain Generalization (MMDG) for enhancing model robustness, it remains unclear whether reported performance gains reflect genuine algorithmic progress or are artifacts of inconsistent evaluation protocols. Current research is fragmented, with studies varying significantly across datasets, modality configurations, and experimental settings. Furthermore, existing benchmarks focus predominantly on action recognition, often neglecting critical real-world challenges such as input corruptions, missing modalities, and model trustworthiness. This lack of standardization obscures a reliable assessment of the field's advancement. To address this issue, we introduce MMDG-Bench, the first unified and comprehensive benchmark for MMDG, which standardizes evaluation across six datasets spanning three diverse tasks: action recognition, mechanical fault…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
