Combining Contrastive and Non-Contrastive Losses for Fine-Tuning Pretrained Models in Speech Analysis
Florian Lux, Ching-Yi Chen, Ngoc Thang Vu

TL;DR
This paper introduces a two-step fine-tuning method for speech models that combines contrastive and non-contrastive losses to improve class invariance and discriminability in paralinguistic tasks.
Contribution
It proposes a novel approach that enhances embedding space quality and uses an adapter for better task-specific classification, outperforming existing methods.
Findings
Outperforms end-to-end fine-tuning baselines on multiple tasks.
Surpasses state-of-the-art in emotion classification benchmark.
Improves class invariance and discriminability in embeddings.
Abstract
Embedding paralinguistic properties is a challenging task as there are only a few hours of training data available for domains such as emotional speech. One solution to this problem is to pretrain a general self-supervised speech representation model on large amounts of unlabeled speech. This pretrained model is then finetuned to a specific task. Paralinguistic properties however have notoriously high class variance, making the finetuning ineffective. In this work, we propose a two step approach to this. First we improve the embedding space, then we train an adapter to bridge the gap from the embedding space to a classification task. In order to improve the class invariance we use a combination of contrastive and non-contrastive losses to explicitly optimize for class invariant, yet discriminative features. Our approach consistently outperforms baselines that are finetuned end-to-end on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Emotion and Mood Recognition · Sentiment Analysis and Opinion Mining
MethodsAdapter
