Turbo your multi-modal classification with contrastive learning
Zhiyu Zhang, Da Liu, Shengqiang Liu, Anna Wang, Jie Gao, and Yali Li

TL;DR
This paper introduces Turbo, a contrastive learning strategy that enhances multi-modal classification by jointly leveraging in-modal and cross-modal contrastive objectives, leading to state-of-the-art results.
Contribution
The paper proposes a novel joint in-modal and cross-modal contrastive learning method called Turbo for improved multi-modal understanding.
Findings
Achieves state-of-the-art performance on speech emotion recognition.
Effectively combines self-supervised and supervised learning.
Demonstrates significant improvement over previous methods.
Abstract
Contrastive learning has become one of the most impressive approaches for multi-modal representation learning. However, previous multi-modal works mainly focused on cross-modal understanding, ignoring in-modal contrastive learning, which limits the representation of each modality. In this paper, we propose a novel contrastive learning strategy, called , to promote multi-modal understanding by joint in-modal and cross-modal contrastive learning. Specifically, multi-modal data pairs are sent through the forward pass twice with different hidden dropout masks to get two different representations for each modality. With these representations, we obtain multiple in-modal and cross-modal contrastive objectives for training. Finally, we combine the self-supervised Turbo with the supervised multi-modal classification and demonstrate its effectiveness on two audio-text classification…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Speech and dialogue systems
MethodsDropout · Contrastive Learning
