Cross Attentional Audio-Visual Fusion for Dimensional Emotion   Recognition

R. Gnana Praveen; Eric Granger; Patrick Cardinal

arXiv:2111.05222·cs.CV·July 9, 2024·1 cites

Cross Attentional Audio-Visual Fusion for Dimensional Emotion Recognition

R. Gnana Praveen, Eric Granger, Patrick Cardinal

PDF

Open Access 1 Repo

TL;DR

This paper proposes a novel cross-attentional audio-visual fusion model for dimensional emotion recognition that effectively captures inter-modal relationships, outperforming existing methods on benchmark datasets.

Contribution

It introduces a cross-attention mechanism for audio-visual fusion, enhancing the extraction of salient features for continuous emotion prediction.

Findings

01

Outperforms state-of-the-art fusion methods on RECOLA dataset.

02

Demonstrates effectiveness on a private fatigue dataset.

03

Provides a cost-effective and accurate multimodal emotion recognition approach.

Abstract

Multimodal analysis has recently drawn much interest in affective computing, since it can improve the overall accuracy of emotion recognition over isolated uni-modal approaches. The most effective techniques for multimodal emotion recognition efficiently leverage diverse and complimentary sources of information, such as facial, vocal, and physiological modalities, to provide comprehensive feature representations. In this paper, we focus on dimensional emotion recognition based on the fusion of facial and vocal modalities extracted from videos, where complex spatiotemporal relationships may be captured. Most of the existing fusion techniques rely on recurrent networks or conventional attention mechanisms that do not effectively leverage the complimentary nature of audio-visual (A-V) modalities. We introduce a cross-attentional fusion approach to extract the salient features across A-V…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

praveena2j/cross-attentional-av-fusion
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition · Speech and Audio Processing