Speech Emotion Recognition Considering Local Dynamic Features
Haotian Guan, Zhilei Liu, Longbiao Wang, Jianwu Dang, Ruiguo Yu

TL;DR
This paper introduces a novel local dynamic pitch distribution feature for speech emotion recognition, demonstrating improved accuracy by capturing prosodic variations often overlooked by global features.
Contribution
The paper proposes a new local dynamic feature based on pitch distribution histograms, enhancing emotion recognition accuracy over traditional global feature methods.
Findings
Local dynamic features outperform global features in recognition accuracy
Experiments on Berlin Database validate the effectiveness of the proposed method
The method captures prosodic variations crucial for emotion detection
Abstract
Recently, increasing attention has been directed to the study of the speech emotion recognition, in which global acoustic features of an utterance are mostly used to eliminate the content differences. However, the expression of speech emotion is a dynamic process, which is reflected through dynamic durations, energies, and some other prosodic information when one speaks. In this paper, a novel local dynamic pitch probability distribution feature, which is obtained by drawing the histogram, is proposed to improve the accuracy of speech emotion recognition. Compared with most of the previous works using global features, the proposed method takes advantage of the local dynamic information conveyed by the emotional speech. Several experiments on Berlin Database of Emotional Speech are conducted to verify the effectiveness of the proposed method. The experimental results demonstrate that the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
