Attention Driven Fusion for Multi-Modal Emotion Recognition

Darshana Priyasad; Tharindu Fernando; Simon Denman; Clinton Fookes,; Sridha Sridharan

arXiv:2009.10991·eess.AS·October 13, 2020

Attention Driven Fusion for Multi-Modal Emotion Recognition

Darshana Priyasad, Tharindu Fernando, Simon Denman, Clinton Fookes,, Sridha Sridharan

PDF

TL;DR

This paper introduces a deep learning approach that fuses acoustic and text data using attention mechanisms and specialized feature extraction layers to improve emotion recognition accuracy on the IEMOCAP dataset.

Contribution

It proposes a novel multi-modal fusion method with SincNet for acoustic features and cross attention for text, enhancing emotion classification performance.

Findings

01

Achieved 3.5% improvement in weighted accuracy over state-of-the-art methods.

02

Utilized SincNet for more effective acoustic feature extraction.

03

Introduced cross attention to model N-gram level correlations in text.

Abstract

Deep learning has emerged as a powerful alternative to hand-crafted methods for emotion recognition on combined acoustic and text modalities. Baseline systems model emotion information in text and acoustic modes independently using Deep Convolutional Neural Networks (DCNN) and Recurrent Neural Networks (RNN), followed by applying attention, fusion, and classification. In this paper, we present a deep learning-based approach to exploit and fuse text and acoustic data for emotion classification. We utilize a SincNet layer, based on parameterized sinc functions with band-pass filters, to extract acoustic features from raw audio followed by a DCNN. This approach learns filter banks tuned for emotion recognition and provides more effective features compared to directly applying convolutions over the raw speech signal. For text processing, we use two branches (a DCNN and a Bi-direction RNN…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsDiffusion-Convolutional Neural Networks