Speech Emotion Recognition with Phonation Excitation Information and Articulatory Kinematics
Ziqian Zhang, Min Huang, Zhongzhe Xiao

TL;DR
This paper explores the use of physiological speech production data, like EGG and EMA, for speech emotion recognition, introducing a new dataset and demonstrating the effectiveness of physiological features in improving SER accuracy.
Contribution
It introduces the STEM-E2VA dataset with physiological data and investigates the feasibility of using estimated physiological information for SER.
Findings
Physiological data improves emotion recognition accuracy.
Estimated physiological features are viable substitutes for collected data.
The approach shows promise for real-world SER applications.
Abstract
Speech emotion recognition (SER) has advanced significantly for the sake of deep-learning methods, while textual information further enhances its performance. However, few studies have focused on the physiological information during speech production, which also encompasses speaker traits, including emotional states. To bridge this gap, we conducted a series of experiments to investigate the potential of the phonation excitation information and articulatory kinematics for SER. Due to the scarcity of training data for this purpose, we introduce a portrayed emotional dataset, STEM-E2VA, which includes audio and physiological data such as electroglottography (EGG) and electromagnetic articulography (EMA). EGG and EMA provide information of phonation excitation and articulatory kinematics, respectively. Additionally, we performed emotion recognition using estimated physiological data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Speech Recognition and Synthesis · Phonetics and Phonology Research
