Speech Emotion Recognition with Global-Aware Fusion on Multi-scale   Feature Representation

Wenjing Zhu; Xiang Li

arXiv:2204.05571·cs.SD·April 13, 2022

Speech Emotion Recognition with Global-Aware Fusion on Multi-scale Feature Representation

Wenjing Zhu, Xiang Li

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel neural network architecture called GLAM that enhances speech emotion recognition by capturing multi-scale features and global emotional information, outperforming previous methods on benchmark data.

Contribution

The paper proposes a global-aware fusion module within a multi-scale CNN framework for improved speech emotion recognition, addressing limitations of local attention mechanisms.

Findings

01

Achieved 2.5% to 4.5% improvements on IEMOCAP benchmark

02

Effectively captures multi-scale emotional features

03

Demonstrates superiority over state-of-the-art methods

Abstract

Speech Emotion Recognition (SER) is a fundamental task to predict the emotion label from speech data. Recent works mostly focus on using convolutional neural networks~(CNNs) to learn local attention map on fixed-scale feature representation by viewing time-varied spectral features as images. However, rich emotional feature at different scales and important global information are not able to be well captured due to the limits of existing CNNs for SER. In this paper, we propose a novel GLobal-Aware Multi-scale (GLAM) neural network (The code is available at https://github.com/lixiangucas01/GLAM) to learn multi-scale feature representation with global-aware fusion module to attend emotional information. Specifically, GLAM iteratively utilizes multiple convolutional kernels with different scales to learn multiple feature representation. Then, instead of using attention-based methods, a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lixiangucas01/glam
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Emotion and Mood Recognition · Speech Recognition and Synthesis