Characterizing Types of Convolution in Deep Convolutional Recurrent Neural Networks for Robust Speech Emotion Recognition
Che-Wei Huang, Shrikanth. S. Narayanan

TL;DR
This paper investigates four types of convolutional operations in deep convolutional recurrent neural networks to improve speech emotion recognition, analyzing their performance under noisy and clean conditions for better understanding and state-of-the-art results.
Contribution
It provides a comprehensive analysis of different convolutional types in deep neural networks for speech emotion recognition, highlighting their effects and interactions in noisy and clean environments.
Findings
All convolution types achieved state-of-the-art performance on eNTERFACE'05.
Detailed module-wise performance analysis revealed insights into information flow.
The study demonstrated the interplay between affective and irrelevant information during processing.
Abstract
Deep convolutional neural networks are being actively investigated in a wide range of speech and audio processing applications including speech recognition, audio event detection and computational paralinguistics, owing to their ability to reduce factors of variations, for learning from speech. However, studies have suggested to favor a certain type of convolutional operations when building a deep convolutional neural network for speech applications although there has been promising results using different types of convolutional operations. In this work, we study four types of convolutional operations on different input features for speech emotion recognition under noisy and clean conditions in order to derive a comprehensive understanding. Since affective behavioral information has been shown to reflect temporally varying of mental state and convolutional operation are applied locally…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
