Continuous Emotion Recognition using Visual-audio-linguistic   information: A Technical Report for ABAW3

Su Zhang; Ruyi An; Yi Ding; Cuntai Guan

arXiv:2203.13031·cs.MM·March 31, 2022

Continuous Emotion Recognition using Visual-audio-linguistic information: A Technical Report for ABAW3

Su Zhang, Ruyi An, Yi Ding, Cuntai Guan

PDF

Open Access 1 Repo

TL;DR

This paper introduces a cross-modal co-attention model that effectively combines visual, audio, and linguistic data for continuous emotion recognition, significantly outperforming baseline methods on the ABAW3 benchmark.

Contribution

It presents a novel multi-modal co-attention framework with a multi-head mechanism and cross-validation, advancing emotion recognition accuracy.

Findings

01

CCC of 0.520 for valence and 0.602 for arousal on test set

02

Significant improvement over baseline CCC scores (0.180 and 0.170)

03

Effective multi-modal feature fusion with co-attention mechanism

Abstract

We propose a cross-modal co-attention model for continuous emotion recognition using visual-audio-linguistic information. The model consists of four blocks. The visual, audio, and linguistic blocks are used to learn the spatial-temporal features of the multi-modal input. A co-attention block is designed to fuse the learned features with the multi-head co-attention mechanism. The visual encoding from the visual block is concatenated with the attention feature to emphasize the visual information. To make full use of the data and alleviate over-fitting, cross-validation is carried out on the training and validation set. The concordance correlation coefficient (CCC) centering is used to merge the results from each fold. The achieved CCC on the test set is $0.520$ for valence and $0.602$ for arousal, which significantly outperforms the baseline method with the corresponding CCC of 0.180 and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sucv/abaw3
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition · Advanced Computing and Algorithms · Video Surveillance and Tracking Methods