Speech and Text-Based Emotion Recognizer

Varun Sharma

arXiv:2312.11503·cs.CL·December 20, 2023·1 cites

Speech and Text-Based Emotion Recognizer

Varun Sharma

PDF

Open Access

TL;DR

This paper develops a balanced, multi-modal speech and text emotion recognition system by combining datasets and applying data augmentation, achieving improved accuracy over baseline models.

Contribution

It introduces a method to create a balanced corpus from existing datasets and explores various architectures for enhanced speech emotion recognition.

Findings

01

Achieved a UA+WA score of 157.57 with the proposed model.

02

Demonstrated the effectiveness of data augmentation and dataset balancing.

03

Outperformed baseline algorithms in emotion recognition accuracy.

Abstract

Affective computing is a field of study that focuses on developing systems and technologies that can understand, interpret, and respond to human emotions. Speech Emotion Recognition (SER), in particular, has got a lot of attention from researchers in the recent past. However, in many cases, the publicly available datasets, used for training and evaluation, are scarce and imbalanced across the emotion labels. In this work, we focused on building a balanced corpus from these publicly available datasets by combining these datasets as well as employing various speech data augmentation techniques. Furthermore, we experimented with different architectures for speech emotion recognition. Our best system, a multi-modal speech, and text-based model, provides a performance of UA(Unweighed Accuracy) + WA (Weighed Accuracy) of 157.57 compared to the baseline algorithm performance of 119.66

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition · Sentiment Analysis and Opinion Mining · Speech and dialogue systems