Extending Multi-modal Contrastive Representations

Zehan Wang; Ziang Zhang; Luping Liu; Yang Zhao; Haifeng Huang; Tao; Jin; Zhou Zhao

arXiv:2310.08884·cs.CV·October 16, 2023·2 cites

Extending Multi-modal Contrastive Representations

Zehan Wang, Ziang Zhang, Luping Liu, Yang Zhao, Haifeng Huang, Tao, Jin, Zhou Zhao

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces Ex-MCR, a training-efficient, paired-data-free method that extends multi-modal contrastive representations to more than three modalities by aligning existing MCR spaces, achieving state-of-the-art results.

Contribution

Ex-MCR is the first approach to extend multi-modal contrastive representations without paired data, integrating multiple existing MCRs into a unified space with improved performance.

Findings

01

Achieves state-of-the-art results on multiple retrieval tasks.

02

Learns a 3D-image-text-audio unified contrastive space without paired data.

03

Demonstrates emergent semantic alignment between extended modalities.

Abstract

Multi-modal contrastive representation (MCR) of more than three modalities is critical in multi-modal learning. Although recent methods showcase impressive achievements, the high dependence on large-scale, high-quality paired data and the expensive training costs limit their further development. Inspired by recent C-MCR, this paper proposes Extending Multimodal Contrastive Representation (Ex-MCR), a training-efficient and paired-data-free method to flexibly learn unified contrastive representation space for more than three modalities by integrating the knowledge of existing MCR spaces. Specifically, Ex-MCR aligns multiple existing MCRs into the same based MCR, which can effectively preserve the original semantic alignment of the based MCR. Besides, we comprehensively enhance the entire learning pipeline for aligning MCR spaces from the perspectives of training data, architecture, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mcr-peft/ex-mcr
pytorchOfficial

Videos

Extending Multi-modal Contrastive Representations· slideslive

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning

MethodsALIGN · Contrastive Language-Image Pre-training