Speech Emotion Recognition with ASR Integration

Yuanchao Li

arXiv:2601.17901·eess.AS·March 3, 2026

Speech Emotion Recognition with ASR Integration

Yuanchao Li

PDF

Open Access

TL;DR

This paper explores integrating Automatic Speech Recognition into Speech Emotion Recognition systems to improve their robustness and applicability in real-world scenarios, addressing current technological limitations.

Contribution

It introduces a novel approach combining ASR with SER to enhance emotion recognition performance in spontaneous and low-resource environments.

Findings

01

Improved emotion recognition accuracy in real-world conditions

02

Enhanced robustness of SER systems through ASR integration

03

Potential for scalable emotion recognition in practical applications

Abstract

Speech Emotion Recognition (SER) plays a pivotal role in understanding human communication, enabling emotionally intelligent systems, and serving as a fundamental component in the development of Artificial General Intelligence (AGI). However, deploying SER in real-world, spontaneous, and low-resource scenarios remains a significant challenge due to the complexity of emotional expression and the limitations of current speech and language technologies. This thesis investigates the integration of Automatic Speech Recognition (ASR) into SER, with the goal of enhancing the robustness, scalability, and practical applicability of emotion recognition from spoken language.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition · Speech Recognition and Synthesis · Music and Audio Processing