A Multi-modal and Multi-task Learning Method for Action Unit and   Expression Recognition

Yue Jin; Tianqing Zheng; Chao Gao; Guoqiang Xu

arXiv:2107.04187·cs.CV·July 16, 2021·21 cites

A Multi-modal and Multi-task Learning Method for Action Unit and Expression Recognition

Yue Jin, Tianqing Zheng, Chao Gao, Guoqiang Xu

PDF

Open Access

TL;DR

This paper presents a multi-modal, multi-task learning approach utilizing visual and audio data for in-the-wild human affect analysis, improving action unit and expression recognition performance.

Contribution

It introduces a novel multi-modal, multi-task framework combining visual and audio cues with sequence modeling for affect recognition in unconstrained environments.

Findings

01

Achieved AU score of 0.712 on validation set

02

Achieved expression score of 0.477 on validation set

03

Demonstrated effectiveness in in-the-wild affect analysis

Abstract

Analyzing human affect is vital for human-computer interaction systems. Most methods are developed in restricted scenarios which are not practical for in-the-wild settings. The Affective Behavior Analysis in-the-wild (ABAW) 2021 Contest provides a benchmark for this in-the-wild problem. In this paper, we introduce a multi-modal and multi-task learning method by using both visual and audio information. We use both AU and expression annotations to train the model and apply a sequence model to further extract associations between video frames. We achieve an AU score of 0.712 and an expression score of 0.477 on the validation set. These results demonstrate the effectiveness of our approach in improving model performance.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition · Human Pose and Action Recognition · Social Robot Interaction and HRI