Research on several key technologies in practical speech emotion recognition
Chengwei Huang

TL;DR
This paper investigates practical speech emotion recognition using various techniques and models, including Gaussian mixture models, context embedding, noise reduction, rejection of unknown emotions, speaker normalization, and multimodal data, to improve robustness and applicability.
Contribution
It introduces a comprehensive approach combining multiple techniques and models for practical speech emotion recognition, including a novel Gaussian mixture model with Markov networks and multimodal signals.
Findings
Effective emotion recognition with Gaussian mixture models and context embedding.
Enhanced robustness through noise reduction and emotion rejection methods.
First integration of electrocardiogram signals for multimodal emotion recognition.
Abstract
In this dissertation the practical speech emotion recognition technology is studied, including several cognitive related emotion types, namely fidgetiness, confidence and tiredness. The high quality of naturalistic emotional speech data is the basis of this research. The following techniques are used for inducing practical emotional speech: cognitive task, computer game, noise stimulation, sleep deprivation and movie clips. A practical speech emotion recognition system is studied based on Gaussian mixture model. A two-class classifier set is adopted for performance improvement under the small sample case. Considering the context information in continuous emotional speech, a Gaussian mixture model embedded with Markov networks is proposed. A further study is carried out for system robustness analysis. First, noise reduction algorithm based on auditory masking properties is fist…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Speech and Audio Processing · Infant Health and Development
