MER-DG: Modality-Entropy Regularization for Multimodal Domain Generalization

Yavuz Yarici; Ghassan AlRegib

arXiv:2605.01967·cs.LG·May 5, 2026

MER-DG: Modality-Entropy Regularization for Multimodal Domain Generalization

Yavuz Yarici, Ghassan AlRegib

PDF

TL;DR

This paper introduces MER-DG, a regularization technique that enhances multimodal domain generalization by promoting feature diversity, leading to improved performance across different environments.

Contribution

The paper proposes Modality-Entropy Regularization (MER-DG), a novel, architecture-agnostic method that prevents fusion overfitting by maximizing feature entropy.

Findings

01

MER-DG improves average accuracy by ~5% over standard fusion methods.

02

MER-DG achieves ~2% higher performance than existing state-of-the-art techniques.

03

Experiments on EPIC-Kitchens and HAC benchmarks validate MER-DG's effectiveness.

Abstract

Deploying multimodal models in real-world scenarios requires generalization to new environments where recording conditions differ from training, a challenge known as multimodal domain generalization (MMDG). Standard architectures employ separate encoders for each modality and a fusion module, training the system end-to-end by optimizing on the fused features. In this paper, we identify that such joint optimization causes encoders to exploit cross-modal co-occurrences, statistical relationships between modalities that arise from source-specific recording conditions, rather than learning domain-invariant features. We term this failure mode Fusion Overfitting. To address this, we propose Modality-Entropy Regularization for Domain Generalization (MER-DG), which maximizes the entropy of each encoder's feature distribution to preserve feature diversity. MER-DG is architecture-agnostic and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.