AmbER$^2$: Dual Ambiguity-Aware Emotion Recognition Applied to Speech and Text

Jingyao Wu; Grace Lin; Yinuo Song; Rosalind Picard

arXiv:2601.18010·eess.AS·January 27, 2026

AmbER$^2$: Dual Ambiguity-Aware Emotion Recognition Applied to Speech and Text

Jingyao Wu, Grace Lin, Yinuo Song, Rosalind Picard

PDF

Open Access

TL;DR

This paper introduces AmbER$^2$, a novel dual ambiguity-aware framework for emotion recognition that explicitly models both rater and modality ambiguity, leading to improved performance on speech and text datasets.

Contribution

It proposes a dual ambiguity-aware approach with a teacher-student architecture that explicitly models rater and modality ambiguity in emotion recognition.

Findings

01

AmbER$^2$ outperforms baseline models in distributional fidelity.

02

Achieves state-of-the-art or superior results on IEMOCAP and MSP-Podcast datasets.

03

Modeling ambiguity benefits highly uncertain samples.

Abstract

Emotion recognition is inherently ambiguous, with uncertainty arising both from rater disagreement and from discrepancies across modalities such as speech and text. There is growing interest in modeling rater ambiguity using label distributions. However, modality ambiguity remains underexplored, and multimodal approaches often rely on simple feature fusion without explicitly addressing conflicts between modalities. In this work, we propose AmbER $^{2}$ , a dual ambiguity-aware framework that simultaneously models rater-level and modality-level ambiguity through a teacher-student architecture with a distribution-wise training objective. Evaluations on IEMOCAP and MSP-Podcast show that AmbER $^{2}$ consistently improves distributional fidelity over conventional cross-entropy baselines and achieves performance competitive with, or superior to, recent state-of-the-art systems. For example, on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition · Sentiment Analysis and Opinion Mining · Speech Recognition and Synthesis