Speech Emotion Recognition with ASR Integration
Yuanchao Li

TL;DR
This paper explores integrating Automatic Speech Recognition into Speech Emotion Recognition systems to improve their robustness and applicability in real-world scenarios, addressing current technological limitations.
Contribution
It introduces a novel approach combining ASR with SER to enhance emotion recognition performance in spontaneous and low-resource environments.
Findings
Improved emotion recognition accuracy in real-world conditions
Enhanced robustness of SER systems through ASR integration
Potential for scalable emotion recognition in practical applications
Abstract
Speech Emotion Recognition (SER) plays a pivotal role in understanding human communication, enabling emotionally intelligent systems, and serving as a fundamental component in the development of Artificial General Intelligence (AGI). However, deploying SER in real-world, spontaneous, and low-resource scenarios remains a significant challenge due to the complexity of emotional expression and the limitations of current speech and language technologies. This thesis investigates the integration of Automatic Speech Recognition (ASR) into SER, with the goal of enhancing the robustness, scalability, and practical applicability of emotion recognition from spoken language.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Speech Recognition and Synthesis · Music and Audio Processing
