Combining Contrastive and Non-Contrastive Losses for Fine-Tuning   Pretrained Models in Speech Analysis

Florian Lux; Ching-Yi Chen; Ngoc Thang Vu

arXiv:2211.01964·cs.CL·November 4, 2022

Combining Contrastive and Non-Contrastive Losses for Fine-Tuning Pretrained Models in Speech Analysis

Florian Lux, Ching-Yi Chen, Ngoc Thang Vu

PDF

Open Access

TL;DR

This paper introduces a two-step fine-tuning method for speech models that combines contrastive and non-contrastive losses to improve class invariance and discriminability in paralinguistic tasks.

Contribution

It proposes a novel approach that enhances embedding space quality and uses an adapter for better task-specific classification, outperforming existing methods.

Findings

01

Outperforms end-to-end fine-tuning baselines on multiple tasks.

02

Surpasses state-of-the-art in emotion classification benchmark.

03

Improves class invariance and discriminability in embeddings.

Abstract

Embedding paralinguistic properties is a challenging task as there are only a few hours of training data available for domains such as emotional speech. One solution to this problem is to pretrain a general self-supervised speech representation model on large amounts of unlabeled speech. This pretrained model is then finetuned to a specific task. Paralinguistic properties however have notoriously high class variance, making the finetuning ineffective. In this work, we propose a two step approach to this. First we improve the embedding space, then we train an adapter to bridge the gap from the embedding space to a classification task. In order to improve the class invariance we use a combination of contrastive and non-contrastive losses to explicitly optimize for class invariant, yet discriminative features. Our approach consistently outperforms baselines that are finetuned end-to-end on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Emotion and Mood Recognition · Sentiment Analysis and Opinion Mining

MethodsAdapter