Harnessing Shared Relations via Multimodal Mixup Contrastive Learning for Multimodal Classification
Raja Kumar, Raghav Singhal, Pranamya Kulkarni, Deval Mehta, Kshitij Jadhav

TL;DR
This paper introduces M3CoL, a multimodal mixup contrastive learning method that captures shared relations across modalities, improving classification performance across diverse datasets by aligning mixed samples and integrating auxiliary supervision.
Contribution
The paper proposes a novel mixup-based contrastive loss and a framework that captures shared relations in multimodal data, enhancing robustness and generalization in multimodal classification.
Findings
Outperforms state-of-the-art on N24News, ROSMAP, and BRCA datasets.
Achieves comparable performance on Food-101.
Effectively captures shared multimodal relations through mixup contrastive learning.
Abstract
Deep multimodal learning has shown remarkable success by leveraging contrastive learning to capture explicit one-to-one relations across modalities. However, real-world data often exhibits shared relations beyond simple pairwise associations. We propose M3CoL, a Multimodal Mixup Contrastive Learning approach to capture nuanced shared relations inherent in multimodal data. Our key contribution is a Mixup-based contrastive loss that learns robust representations by aligning mixed samples from one modality with their corresponding samples from other modalities thereby capturing shared relations between them. For multimodal classification tasks, we introduce a framework that integrates a fusion module with unimodal prediction modules for auxiliary supervision during training, complemented by our proposed Mixup-based contrastive loss. Through extensive experiments on diverse datasets…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Natural Language Processing Techniques · Speech and dialogue systems
MethodsContrastive Learning · Mixup
