Contribution of Glottal Waveform in Speech Emotion: A Comparative Pairwise Investigation
Zhongzhe Xiao, Ying Chen, Zhi Tao

TL;DR
This study examines how the glottal waveform contributes to expressing emotions in speech, finding it conveys most emotional cues and can distinguish certain emotional pairs with high accuracy.
Contribution
It provides a comparative analysis of emotional information in speech and glottal waveforms, highlighting the significance of glottal features in emotion recognition.
Findings
Glottal waveform conveys most emotional cues in speech.
High accuracy (92.45%) in distinguishing intense anger from moderate sadness.
Glottal waveform better represents valence than arousal cues.
Abstract
In this work, we investigated the contribution of the glottal waveform in human vocal emotion expressing. Seven emotional states including moderate and intense versions of three emotional families as anger, joy, and sadness, plus a neutral state are considered, with speech samples in Mandarin Chinese. The glottal waveform extracted from speech samples of different emotion states are first analyzed in both time domain and frequency domain to discover their differences. Comparative emotion classifications are then taken out based on features extracted from original whole speech signal and only glottal wave signal. In experiments of generation of a performance-driven hierarchical classifier architecture, and pairwise classification on individual emotional states, the low difference between accuracies obtained from speech signal and glottal signal proved that a majority of emotional cues in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Speech and Audio Processing · Music and Audio Processing
