Jointly Modeling Inter- & Intra-Modality Dependencies for Multi-modal Learning
Divyam Madaan, Taro Makino, Sumit Chopra, Kyunghyun Cho

TL;DR
This paper introduces I2M2, a framework that jointly models inter- and intra-modality dependencies in multi-modal learning, leading to improved prediction accuracy in healthcare and vision-language tasks.
Contribution
The paper proposes a novel I2M2 framework that captures both inter- and intra-modality dependencies, enhancing multi-modal learning performance.
Findings
Outperforms traditional single-dependency models on real-world datasets
Demonstrates superior accuracy in healthcare and vision-language tasks
Validates effectiveness through state-of-the-art comparisons
Abstract
Supervised multi-modal learning involves mapping multiple modalities to a target label. Previous studies in this field have concentrated on capturing in isolation either the inter-modality dependencies (the relationships between different modalities and the label) or the intra-modality dependencies (the relationships within a single modality and the label). We argue that these conventional approaches that rely solely on either inter- or intra-modality dependencies may not be optimal in general. We view the multi-modal learning problem from the lens of generative models where we consider the target as a source of multiple modalities and the interaction between them. Towards that end, we propose inter- & intra-modality modeling (I2M2) framework, which captures and integrates both the inter- and intra-modality dependencies, leading to more accurate predictions. We evaluate our approach…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Speech and dialogue systems · linguistics and terminology studies
