CWCL: Cross-Modal Transfer with Continuously Weighted Contrastive Loss
Rakshith Sharma Srinivasa, Jaejin Cho, Chouchang Yang, Yashas Malur, Saidutta, Ching-Hua Lee, Yilin Shen, Hongxia Jin

TL;DR
This paper introduces CWCL, a novel continuous similarity-based contrastive loss for cross-modal zero-shot transfer, improving alignment and performance across image-text and speech-text tasks over existing methods.
Contribution
The paper proposes the Continuously Weighted Contrastive Loss (CWCL), a new loss function that models similarity as a continuous measure, enhancing cross-modal representation alignment.
Findings
Achieves 5-8% improvement in zero-shot image classification.
Achieves 20-30% improvement in zero-shot speech-to-intent classification.
Outperforms existing methods across multiple models, datasets, and modalities.
Abstract
This paper considers contrastive training for cross-modal 0-shot transfer wherein a pre-trained model in one modality is used for representation learning in another domain using pairwise data. The learnt models in the latter domain can then be used for a diverse set of tasks in a zero-shot way, similar to ``Contrastive Language-Image Pre-training (CLIP)'' and ``Locked-image Tuning (LiT)'' that have recently gained considerable attention. Most existing works for cross-modal representation alignment (including CLIP and LiT) use the standard contrastive training objective, which employs sets of positive and negative examples to align similar and repel dissimilar training data samples. However, similarity amongst training examples has a more continuous nature, thus calling for a more `non-binary' treatment. To address this, we propose a novel loss function called Continuously Weighted…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Cancer-related molecular mechanisms research · Domain Adaptation and Few-Shot Learning
MethodsALIGN · Contrastive Language-Image Pre-training
