Searching for Effective Preprocessing Method and CNN-based Architecture with Efficient Channel Attention on Speech Emotion Recognition
Byunggun Kim, Younghun Kwon

TL;DR
This paper introduces an effective preprocessing strategy and a CNN model with efficient channel attention for speech emotion recognition, achieving state-of-the-art results despite limited training data.
Contribution
It proposes a novel combination of multi-resolution preprocessing and ECA-augmented CNN architecture to improve SER performance with fewer parameters.
Findings
Increasing frequency resolution improves emotion recognition accuracy.
ECA blocks enhance channel feature representation efficiently.
Data augmentation with multiple preprocessing methods boosts performance.
Abstract
Speech emotion recognition (SER) classifies human emotions in speech with a computer model. Recently, performance in SER has steadily increased as deep learning techniques have adapted. However, unlike many domains that use speech data, data for training in the SER model is insufficient. This causes overfitting of training of the neural network, resulting in performance degradation. In fact, successful emotion recognition requires an effective preprocessing method and a model structure that efficiently uses the number of weight parameters. In this study, we propose using eight dataset versions with different frequency-time resolutions to search for an effective emotional speech preprocessing method. We propose a 6-layer convolutional neural network (CNN) model with efficient channel attention (ECA) to pursue an efficient model structure. In particular, the well-positioned ECA blocks can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Internet of Things and Social Network Interactions
MethodsSoftmax · Attention Is All You Need · Average Pooling · *Communicated@Fast*How Do I Communicate to Expedia? · 1x1 Convolution · Residual Connection · Global Average Pooling · Sigmoid Activation · Efficient Channel Attention · Convolution
