Efficient Speech Emotion Recognition Using Multi-Scale CNN and Attention

Zixuan Peng; Yu Lu; Shengfeng Pan; Yunfeng Liu

arXiv:2106.04133·cs.SD·June 9, 2021

Efficient Speech Emotion Recognition Using Multi-Scale CNN and Attention

Zixuan Peng, Yu Lu, Shengfeng Pan, Yunfeng Liu

PDF

1 Repo

TL;DR

This paper introduces an efficient multi-scale CNN and attention-based neural network architecture that effectively combines acoustic and lexical features for speech emotion recognition, outperforming previous methods on the IEMOCAP dataset.

Contribution

The paper proposes a novel multi-scale CNN architecture with attention mechanisms to integrate acoustic and lexical features for improved speech emotion recognition.

Findings

01

Outperforms state-of-the-art on IEMOCAP with 5% accuracy improvement

02

Effective fusion of audio and text features using MSCNN and attention modules

03

Achieves higher weighted and unweighted accuracy in emotion classification

Abstract

Emotion recognition from speech is a challenging task. Re-cent advances in deep learning have led bi-directional recur-rent neural network (Bi-RNN) and attention mechanism as astandard method for speech emotion recognition, extractingand attending multi-modal features - audio and text, and thenfusing them for downstream emotion classification tasks. Inthis paper, we propose a simple yet efficient neural networkarchitecture to exploit both acoustic and lexical informationfrom speech. The proposed framework using multi-scale con-volutional layers (MSCNN) to obtain both audio and text hid-den representations. Then, a statistical pooling unit (SPU)is used to further extract the features in each modality. Be-sides, an attention module can be built on top of the MSCNN-SPU (audio) and MSCNN (text) to further improve the perfor-mance. Extensive experiments show that the proposed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

julianyulu/icassp2021-mscnn-spu
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.