GM-TCNet: Gated Multi-scale Temporal Convolutional Network using Emotion Causality for Speech Emotion Recognition
Jia-Xin Ye, Xin-Cheng Wen, Xuan-Ze Wang, Yong Xu, Yan Luo, Chang-Li, Wu, Li-Yan Chen, Kun-Hong Liu

TL;DR
This paper introduces GM-TCNet, a novel neural network architecture that captures emotional causality in speech using multi-scale gated temporal convolutions, significantly improving speech emotion recognition accuracy.
Contribution
The paper proposes a new gated multi-scale temporal convolutional network with emotional causality learning, enhancing robustness and discriminability in speech emotion recognition.
Findings
Achieves state-of-the-art performance on SER datasets.
Effectively models emotional causality dynamics in speech.
Outperforms existing methods in most evaluation metrics.
Abstract
In human-computer interaction, Speech Emotion Recognition (SER) plays an essential role in understanding the user's intent and improving the interactive experience. While similar sentimental speeches own diverse speaker characteristics but share common antecedents and consequences, an essential challenge for SER is how to produce robust and discriminative representations through causality between speech emotions. In this paper, we propose a Gated Multi-scale Temporal Convolutional Network (GM-TCNet) to construct a novel emotional causality representation learning component with a multi-scale receptive field. GM-TCNet deploys a novel emotional causality representation learning component to capture the dynamics of emotion across the time domain, constructed with dilated causal convolution layer and gating mechanism. Besides, it utilizes skip connection fusing high-level features from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Speech and Audio Processing · Sentiment Analysis and Opinion Mining
MethodsGated Linear Unit · 1x1 Convolution · Gated Convolution · Convolution · Dilated Causal Convolution · Causal Convolution
