Integrating Contrastive Learning into a Multitask Transformer Model for Effective Domain Adaptation
Chung-Soo Ahn, Jagath C. Rajapakse, Rajib Rana

TL;DR
This paper introduces a novel domain adaptation method for speech emotion recognition that combines multitask learning with contrastive learning and transformer fine-tuning, achieving state-of-the-art cross-corpus results.
Contribution
It presents a new domain adaptation approach integrating contrastive learning and multitask training with transformer models for improved SER generalization.
Findings
Achieves state-of-the-art cross-corpus SER performance
Effective domain adaptation across different datasets
Enhances generalization in speech emotion recognition
Abstract
While speech emotion recognition (SER) research has made significant progress, achieving generalization across various corpora continues to pose a problem. We propose a novel domain adaptation technique that embodies a multitask framework with SER as the primary task, and contrastive learning and information maximisation loss as auxiliary tasks, underpinned by fine-tuning of transformers pre-trained on large language models. Empirical results obtained through experiments on well-established datasets like IEMOCAP and MSP-IMPROV, illustrate that our proposed model achieves state-of-the-art performance in SER within cross-corpus scenarios.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Emotion and Mood Recognition · Sentiment Analysis and Opinion Mining
MethodsContrastive Learning
