Multi-modal Facial Action Unit Detection with Large Pre-trained Models   for the 5th Competition on Affective Behavior Analysis in-the-wild

Yufeng Yin; Minh Tran; Di Chang; Xinrui Wang; Mohammad Soleymani

arXiv:2303.10590·cs.CV·April 19, 2023·5 cites

Multi-modal Facial Action Unit Detection with Large Pre-trained Models for the 5th Competition on Affective Behavior Analysis in-the-wild

Yufeng Yin, Minh Tran, Di Chang, Xinrui Wang, Mohammad Soleymani

PDF

Open Access

TL;DR

This paper introduces a multi-modal approach for facial action unit detection using large pre-trained models across visual, acoustic, and lexical features, achieving competitive results in the ABAW 2023 Challenge.

Contribution

It presents a novel multi-modal method combining visual, acoustic, and lexical features with pre-trained models for improved AU detection.

Findings

01

Achieved an F1 score of 52.3% on ABAW 2023 validation set.

02

Enhanced visual features with super-resolution and face alignment.

03

Demonstrated the effectiveness of multi-modal features in AU detection.

Abstract

Facial action unit detection has emerged as an important task within facial expression analysis, aimed at detecting specific pre-defined, objective facial expressions, such as lip tightening and cheek raising. This paper presents our submission to the Affective Behavior Analysis in-the-wild (ABAW) 2023 Competition for AU detection. We propose a multi-modal method for facial action unit detection with visual, acoustic, and lexical features extracted from the large pre-trained models. To provide high-quality details for visual feature extraction, we apply super-resolution and face alignment to the training data and show potential performance gain. Our approach achieves the F1 score of 52.3% on the official validation set of the 5th ABAW Challenge.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Emotion and Mood Recognition