Learning Spontaneity to Improve Emotion Recognition In Speech
Karttikeya Mangalam, Tanaya Guha

TL;DR
This paper explores how detecting spontaneity in speech can enhance emotion recognition accuracy, proposing models that jointly learn spontaneity and emotion, leading to state-of-the-art results on the IEMOCAP dataset.
Contribution
It introduces a novel approach using spontaneity detection as an auxiliary task to improve speech emotion recognition, achieving state-of-the-art performance.
Findings
Spontaneity detection improves emotion recognition accuracy.
Hierarchical and multitask models outperform baselines.
Achieved 69.1% accuracy on IEMOCAP for 4-class emotion recognition.
Abstract
We investigate the effect and usefulness of spontaneity (i.e. whether a given speech is spontaneous or not) in speech in the context of emotion recognition. We hypothesize that emotional content in speech is interrelated with its spontaneity, and use spontaneity classification as an auxiliary task to the problem of emotion recognition. We propose two supervised learning settings that utilize spontaneity to improve speech emotion recognition: a hierarchical model that performs spontaneity detection before performing emotion recognition, and a multitask learning model that jointly learns to recognize both spontaneity and emotion. Through various experiments on the well known IEMOCAP database, we show that by using spontaneity detection as an additional task, significant improvement can be achieved over emotion recognition systems that are unaware of spontaneity. We achieve…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
