Improving Multimodal Learning with Dispersive and Anchoring Regularization
Zixuan Xia, Hao Wang, Pengcheng Weng, Yanyu Qian, Yangxin Xu, William Dan, Fei Wang

TL;DR
This paper introduces egName, a geometry-aware regularization framework for multimodal learning that improves representation structure and robustness without architectural changes.
Contribution
It proposes a novel regularizer enforcing intra-modal diversity and inter-modal stability, addressing geometric pathologies in multimodal models.
Findings
Consistent performance improvements across multiple benchmarks.
Enhanced unimodal robustness and multimodal fusion quality.
Effective mitigation of modality trade-offs.
Abstract
Multimodal learning aims to integrate complementary information from heterogeneous modalities, yet strong optimization alone does not guaranty well-structured representations. Even under carefully balanced training schemes, multimodal models often exhibit geometric pathologies, including intra-modal representation collapse and sample-level cross-modal inconsistency, which degrade both unimodal robustness and multimodal fusion. We identify representation geometry as a missing control axis in multimodal learning and propose \regName, a lightweight geometry-aware regularization framework. \regName enforces two complementary constraints on intermediate embeddings: an intra-modal dispersive regularization that promotes representation diversity, and an inter-modal anchoring regularization that bounds sample-level cross-modal drift without rigid alignment. The proposed regularizer is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
