MER-DG: Modality-Entropy Regularization for Multimodal Domain Generalization
Yavuz Yarici, Ghassan AlRegib

TL;DR
This paper introduces MER-DG, a regularization technique that enhances multimodal domain generalization by promoting feature diversity, leading to improved performance across different environments.
Contribution
The paper proposes Modality-Entropy Regularization (MER-DG), a novel, architecture-agnostic method that prevents fusion overfitting by maximizing feature entropy.
Findings
MER-DG improves average accuracy by ~5% over standard fusion methods.
MER-DG achieves ~2% higher performance than existing state-of-the-art techniques.
Experiments on EPIC-Kitchens and HAC benchmarks validate MER-DG's effectiveness.
Abstract
Deploying multimodal models in real-world scenarios requires generalization to new environments where recording conditions differ from training, a challenge known as multimodal domain generalization (MMDG). Standard architectures employ separate encoders for each modality and a fusion module, training the system end-to-end by optimizing on the fused features. In this paper, we identify that such joint optimization causes encoders to exploit cross-modal co-occurrences, statistical relationships between modalities that arise from source-specific recording conditions, rather than learning domain-invariant features. We term this failure mode Fusion Overfitting. To address this, we propose Modality-Entropy Regularization for Domain Generalization (MER-DG), which maximizes the entropy of each encoder's feature distribution to preserve feature diversity. MER-DG is architecture-agnostic and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
