DNN-HMM based Speaker Adaptive Emotion Recognition using Proposed Epoch and MFCC Features
Md. Shah Fahad, Jainath Yadav, Gyadhar Pradhan, Akshay Deepak

TL;DR
This paper introduces a novel speech emotion recognition method combining excitation source features (epochs) with MFCC features, using DNN-HMM models, achieving improved accuracy on the IEMOCAP database.
Contribution
The study proposes a new feature set combining epoch-based excitation features with MFCCs for emotion recognition, enhancing model performance.
Findings
Combined features improve accuracy to 64.2%.
Epoch features alone achieve 54.52% accuracy.
MFCC features alone achieve 59.25% accuracy.
Abstract
Speech is produced when time varying vocal tract system is excited with time varying excitation source. Therefore, the information present in a speech such as message, emotion, language, speaker is due to the combined effect of both excitation source and vocal tract system. However, there is very less utilization of excitation source features to recognize emotion. In our earlier work, we have proposed a novel method to extract glottal closure instants (GCIs) known as epochs. In this paper, we have explored epoch features namely instantaneous pitch, phase and strength of epochs for discriminating emotions. We have combined the excitation source features and the well known Male-frequency cepstral coefficient (MFCC) features to develop an emotion recognition system with improved performance. DNN-HMM speaker adaptive models have been developed using MFCC, epoch and combined features.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
