Learning Physiology-Informed Vocal Spectrotemporal Representations for Speech Emotion Recognition
Xu Zhang, Longbing Cao, Runze Yang, Zhangkai Wu

TL;DR
PhysioSER introduces a physiology-informed vocal spectrotemporal representation method that enhances speech emotion recognition by modeling amplitude and phase dynamics based on voice anatomy, leading to interpretable and efficient emotion detection.
Contribution
The paper presents PhysioSER, a novel framework that integrates physiological voice features with deep learning for improved, interpretable speech emotion recognition.
Findings
Effective across 14 datasets and 10 languages
Validated in real-time humanoid robot deployment
Outperforms existing models in interpretability and efficiency
Abstract
Speech emotion recognition (SER) is essential for humanoid robot tasks such as social robotic interactions and robotic psychological diagnosis, where interpretable and efficient models are critical for safety and performance. Existing deep models trained on large datasets remain largely uninterpretable, often insufficiently modeling underlying emotional acoustic signals and failing to capture and analyze the core physiology of emotional vocal behaviors. Physiological research on human voices shows that the dynamics of vocal amplitude and phase correlate with emotions through the vocal tract filter and the glottal source. However, most existing deep models solely involve amplitude but fail to couple the physiological features of and between amplitude and phase. Here, we propose PhysioSER, a physiology-informed vocal spectrotemporal representation learning method, to address these issues…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Voice and Speech Disorders · Speech Recognition and Synthesis
