Characterizing Types of Convolution in Deep Convolutional Recurrent   Neural Networks for Robust Speech Emotion Recognition

Che-Wei Huang; Shrikanth. S. Narayanan

arXiv:1706.02901·cs.LG·January 16, 2018·25 cites

Characterizing Types of Convolution in Deep Convolutional Recurrent Neural Networks for Robust Speech Emotion Recognition

Che-Wei Huang, Shrikanth. S. Narayanan

PDF

Open Access

TL;DR

This paper investigates four types of convolutional operations in deep convolutional recurrent neural networks to improve speech emotion recognition, analyzing their performance under noisy and clean conditions for better understanding and state-of-the-art results.

Contribution

It provides a comprehensive analysis of different convolutional types in deep neural networks for speech emotion recognition, highlighting their effects and interactions in noisy and clean environments.

Findings

01

All convolution types achieved state-of-the-art performance on eNTERFACE'05.

02

Detailed module-wise performance analysis revealed insights into information flow.

03

The study demonstrated the interplay between affective and irrelevant information during processing.

Abstract

Deep convolutional neural networks are being actively investigated in a wide range of speech and audio processing applications including speech recognition, audio event detection and computational paralinguistics, owing to their ability to reduce factors of variations, for learning from speech. However, studies have suggested to favor a certain type of convolutional operations when building a deep convolutional neural network for speech applications although there has been promising results using different types of convolutional operations. In this work, we study four types of convolutional operations on different input features for speech emotion recognition under noisy and clean conditions in order to derive a comprehensive understanding. Since affective behavioral information has been shown to reflect temporally varying of mental state and convolutional operation are applied locally…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing