LanSER: Language-Model Supported Speech Emotion Recognition

Taesik Gong; Josh Belanich; Krishna Somandepalli; Arsha Nagrani; Brian; Eoff; Brendan Jou

arXiv:2309.03978·cs.CL·September 11, 2023

LanSER: Language-Model Supported Speech Emotion Recognition

Taesik Gong, Josh Belanich, Krishna Somandepalli, Arsha Nagrani, Brian, Eoff, Brendan Jou

PDF

TL;DR

LanSER introduces a weakly-supervised approach for speech emotion recognition that leverages large language models to infer emotion labels from speech transcripts, reducing reliance on costly labeled data.

Contribution

It proposes a novel method using pre-trained language models and textual entailment to generate weak labels for SER, enhancing scalability and label efficiency.

Findings

01

Pre-trained models with weak supervision outperform baselines on standard datasets.

02

The approach improves label efficiency in speech emotion recognition.

03

Representations capture prosodic speech content despite text-based label derivation.

Abstract

Speech emotion recognition (SER) models typically rely on costly human-labeled data for training, making scaling methods to large speech datasets and nuanced emotion taxonomies difficult. We present LanSER, a method that enables the use of unlabeled data by inferring weak emotion labels via pre-trained large language models through weakly-supervised learning. For inferring weak labels constrained to a taxonomy, we use a textual entailment approach that selects an emotion label with the highest entailment score for a speech transcript extracted via automatic speech recognition. Our experimental results show that models pre-trained on large datasets with this weak supervision outperform other baseline models on standard SER datasets when fine-tuned, and show improved label efficiency. Despite being pre-trained on labels derived only from text, we show that the resulting representations…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.