Jointly Modeling Inter- & Intra-Modality Dependencies for Multi-modal   Learning

Divyam Madaan; Taro Makino; Sumit Chopra; Kyunghyun Cho

arXiv:2405.17613·cs.CV·December 9, 2024

Jointly Modeling Inter- & Intra-Modality Dependencies for Multi-modal Learning

Divyam Madaan, Taro Makino, Sumit Chopra, Kyunghyun Cho

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces I2M2, a framework that jointly models inter- and intra-modality dependencies in multi-modal learning, leading to improved prediction accuracy in healthcare and vision-language tasks.

Contribution

The paper proposes a novel I2M2 framework that captures both inter- and intra-modality dependencies, enhancing multi-modal learning performance.

Findings

01

Outperforms traditional single-dependency models on real-world datasets

02

Demonstrates superior accuracy in healthcare and vision-language tasks

03

Validates effectiveness through state-of-the-art comparisons

Abstract

Supervised multi-modal learning involves mapping multiple modalities to a target label. Previous studies in this field have concentrated on capturing in isolation either the inter-modality dependencies (the relationships between different modalities and the label) or the intra-modality dependencies (the relationships within a single modality and the label). We argue that these conventional approaches that rely solely on either inter- or intra-modality dependencies may not be optimal in general. We view the multi-modal learning problem from the lens of generative models where we consider the target as a source of multiple modalities and the interaction between them. Towards that end, we propose inter- & intra-modality modeling (I2M2) framework, which captures and integrates both the inter- and intra-modality dependencies, leading to more accurate predictions. We evaluate our approach…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

divyam3897/i2m2
pytorchOfficial

Videos

Jointly Modeling Inter- & Intra-Modality Dependencies for Multi-modal Learning· slideslive

Taxonomy

TopicsNatural Language Processing Techniques · Speech and dialogue systems · linguistics and terminology studies