Deep Residual Local Feature Learning for Speech Emotion Recognition
Sattaya Singkul, Thakorn Chatchaisathaporn, Boontawee Suntisrivaraporn, and Kuntpong Woraratpanya

TL;DR
This paper introduces DeepResLFLB, a deep residual local feature learning block, to enhance speech emotion recognition by addressing vanishing gradient issues and improving accuracy on standard datasets.
Contribution
It proposes a novel deep residual local feature learning block (DeepResLFLB) combining cascade blocks to improve SER performance and training efficiency.
Findings
Significant improvements in accuracy, precision, recall, and F1-score on EMODB and RAVDESS datasets.
Effective mitigation of vanishing gradient problem in deep SER models.
Enhanced local and hierarchical feature learning for speech emotion recognition.
Abstract
Speech Emotion Recognition (SER) is becoming a key role in global business today to improve service efficiency, like call center services. Recent SERs were based on a deep learning approach. However, the efficiency of deep learning depends on the number of layers, i.e., the deeper layers, the higher efficiency. On the other hand, the deeper layers are causes of a vanishing gradient problem, a low learning rate, and high time-consuming. Therefore, this paper proposed a redesign of existing local feature learning block (LFLB). The new design is called a deep residual local feature learning block (DeepResLFLB). DeepResLFLB consists of three cascade blocks: LFLB, residual local feature learning block (ResLFLB), and multilayer perceptron (MLP). LFLB is built for learning local correlations along with extracting hierarchical correlations; DeepResLFLB can take advantage of repeatedly learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
