Sentiment-Aware Automatic Speech Recognition pre-training for enhanced   Speech Emotion Recognition

Ayoub Ghriss; Bo Yang; Viktor Rozgic; Elizabeth Shriberg; Chao Wang

arXiv:2201.11826·cs.CL·January 31, 2022·1 cites

Sentiment-Aware Automatic Speech Recognition pre-training for enhanced Speech Emotion Recognition

Ayoub Ghriss, Bo Yang, Viktor Rozgic, Elizabeth Shriberg, Chao Wang

PDF

Open Access

TL;DR

This paper introduces a multi-task pre-training approach that combines automatic speech recognition and sentiment classification to improve speech emotion recognition accuracy, achieving state-of-the-art results on MSP-Podcast.

Contribution

It presents a novel multi-task pre-training method that enhances speech emotion recognition by making ASR models sentiment-aware through joint training.

Findings

01

Achieved a CCC of 0.41 for valence prediction on MSP-Podcast

02

Demonstrated improved emotion recognition performance over baseline models

03

Proposed a sentiment-aware pre-training framework for speech models

Abstract

We propose a novel multi-task pre-training method for Speech Emotion Recognition (SER). We pre-train SER model simultaneously on Automatic Speech Recognition (ASR) and sentiment classification tasks to make the acoustic ASR model more ``emotion aware''. We generate targets for the sentiment classification using text-to-sentiment model trained on publicly available data. Finally, we fine-tune the acoustic ASR on emotion annotated speech data. We evaluated the proposed approach on the MSP-Podcast dataset, where we achieved the best reported concordance correlation coefficient (CCC) of 0.41 for valence prediction.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Sentiment Analysis and Opinion Mining · Speech and Audio Processing