Multiscale Contextual Learning for Speech Emotion Recognition in Emergency Call Center Conversations
Th\'eo Deschamps-Berger, Lori Lamel, Laurence Devillers

TL;DR
This paper introduces a multiscale conversational context learning approach for speech emotion recognition in emergency calls, demonstrating that context improves prediction accuracy, especially in text, but is more challenging in acoustic data.
Contribution
The study proposes a novel multi-scale context learning method for speech emotion recognition and evaluates its effectiveness on real emergency call data, highlighting the importance of context in emotion prediction.
Findings
Context from previous tokens significantly improves prediction accuracy.
Using the last speech turn of the same speaker is beneficial.
Transformers enhance text-based emotion recognition, but acoustic context remains challenging.
Abstract
Emotion recognition in conversations is essential for ensuring advanced human-machine interactions. However, creating robust and accurate emotion recognition systems in real life is challenging, mainly due to the scarcity of emotion datasets collected in the wild and the inability to take into account the dialogue context. The CEMO dataset, composed of conversations between agents and patients during emergency calls to a French call center, fills this gap. The nature of these interactions highlights the role of the emotional flow of the conversation in predicting patient emotions, as context can often make a difference in understanding actual feelings. This paper presents a multi-scale conversational context learning approach for speech emotion recognition, which takes advantage of this hypothesis. We investigated this approach on both speech transcriptions and acoustic segments.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Emotion and Mood Recognition · Speech and dialogue systems
