TL;DR
This paper introduces TupleInfoNCE, a novel contrastive learning method for multimodal data that enhances the learning of both shared and complementary information across modalities, improving downstream task performance.
Contribution
It proposes a new contrastive loss that considers both positive/negative tuples and composed negatives, with theoretical mutual information justification and an optimized sampling strategy.
Findings
Outperforms previous methods on three downstream tasks
Effectively captures shared and complementary multimodal information
Ensures weaker modalities are not ignored during learning
Abstract
This paper proposes a method for representation learning of multimodal data using contrastive losses. A traditional approach is to contrast different modalities to learn the information shared between them. However, that approach could fail to learn the complementary synergies between modalities that might be useful for downstream tasks. Another approach is to concatenate all the modalities into a tuple and then contrast positive and negative tuple correspondences. However, that approach could consider only the stronger modalities while ignoring the weaker ones. To address these issues, we propose a novel contrastive learning objective, TupleInfoNCE. It contrasts tuples based not only on positive and negative correspondences but also by composing new negative tuples using modalities describing different scenes. Training with these additional negatives encourages the learning model to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsContrastive Learning
