AmbER$^2$: Dual Ambiguity-Aware Emotion Recognition Applied to Speech and Text
Jingyao Wu, Grace Lin, Yinuo Song, Rosalind Picard

TL;DR
This paper introduces AmbER$^2$, a novel dual ambiguity-aware framework for emotion recognition that explicitly models both rater and modality ambiguity, leading to improved performance on speech and text datasets.
Contribution
It proposes a dual ambiguity-aware approach with a teacher-student architecture that explicitly models rater and modality ambiguity in emotion recognition.
Findings
AmbER$^2$ outperforms baseline models in distributional fidelity.
Achieves state-of-the-art or superior results on IEMOCAP and MSP-Podcast datasets.
Modeling ambiguity benefits highly uncertain samples.
Abstract
Emotion recognition is inherently ambiguous, with uncertainty arising both from rater disagreement and from discrepancies across modalities such as speech and text. There is growing interest in modeling rater ambiguity using label distributions. However, modality ambiguity remains underexplored, and multimodal approaches often rely on simple feature fusion without explicitly addressing conflicts between modalities. In this work, we propose AmbER, a dual ambiguity-aware framework that simultaneously models rater-level and modality-level ambiguity through a teacher-student architecture with a distribution-wise training objective. Evaluations on IEMOCAP and MSP-Podcast show that AmbER consistently improves distributional fidelity over conventional cross-entropy baselines and achieves performance competitive with, or superior to, recent state-of-the-art systems. For example, on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Sentiment Analysis and Opinion Mining · Speech Recognition and Synthesis
