Temporal aggregation of audio-visual modalities for emotion recognition

Andreea Birhala; Catalin Nicolae Ristea; Anamaria Radoi; Liviu; Cristian Dutu

arXiv:2007.04364·cs.CV·July 10, 2020

Temporal aggregation of audio-visual modalities for emotion recognition

Andreea Birhala, Catalin Nicolae Ristea, Anamaria Radoi, Liviu, Cristian Dutu

PDF

TL;DR

This paper introduces a novel multimodal fusion technique that combines audio and visual data over temporal windows for improved emotion recognition, outperforming existing methods and human accuracy on the CREMA-D dataset.

Contribution

The paper proposes a new temporal aggregation method for audio-visual emotion recognition that enhances accuracy by integrating modalities with different temporal offsets.

Findings

01

Outperforms existing methods on CREMA-D dataset

02

Achieves higher accuracy than human raters

03

Demonstrates the effectiveness of temporal window fusion

Abstract

Emotion recognition has a pivotal role in affective computing and in human-computer interaction. The current technological developments lead to increased possibilities of collecting data about the emotional state of a person. In general, human perception regarding the emotion transmitted by a subject is based on vocal and visual information collected in the first seconds of interaction with the subject. As a consequence, the integration of verbal (i.e., speech) and non-verbal (i.e., image) information seems to be the preferred choice in most of the current approaches towards emotion recognition. In this paper, we propose a multimodal fusion technique for emotion recognition based on combining audio-visual modalities from a temporal window with different temporal offsets for each modality. We show that our proposed method outperforms other methods from the literature and human accuracy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.