Detecting Emotion Carriers by Combining Acoustic and Lexical Representations
Sebastian P. Bayerl, Aniruddha Tammewar, Korbinian Riedhammer and, Giuseppe Riccardi

TL;DR
This paper proposes a method to detect emotion carriers in spoken narratives by combining acoustic and lexical representations using neural networks and fusion techniques, aiming to enhance emotional understanding in dialogue systems.
Contribution
It introduces a novel approach that leverages word-based acoustic and textual embeddings with fusion strategies to identify emotion carriers in spoken narratives.
Findings
Late fusion improves detection accuracy significantly.
ResNet-based acoustic embeddings enhance emotion carrier detection.
Combining acoustic and lexical features outperforms lexical-only methods.
Abstract
Personal narratives (PN) - spoken or written - are recollections of facts, people, events, and thoughts from one's own experience. Emotion recognition and sentiment analysis tasks are usually defined at the utterance or document level. However, in this work, we focus on Emotion Carriers (EC) defined as the segments (speech or text) that best explain the emotional state of the narrator ("loss of father", "made me choose"). Once extracted, such EC can provide a richer representation of the user state to improve natural language understanding and dialogue modeling. In previous work, it has been shown that EC can be identified using lexical features. However, spoken narratives should provide a richer description of the context and the users' emotional state. In this paper, we leverage word-based acoustic and textual embeddings as well as early and late fusion techniques for the detection of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSentiment Analysis and Opinion Mining · Speech Recognition and Synthesis · Emotion and Mood Recognition
