Multi-modal Facial Action Unit Detection with Large Pre-trained Models for the 5th Competition on Affective Behavior Analysis in-the-wild
Yufeng Yin, Minh Tran, Di Chang, Xinrui Wang, Mohammad Soleymani

TL;DR
This paper introduces a multi-modal approach for facial action unit detection using large pre-trained models across visual, acoustic, and lexical features, achieving competitive results in the ABAW 2023 Challenge.
Contribution
It presents a novel multi-modal method combining visual, acoustic, and lexical features with pre-trained models for improved AU detection.
Findings
Achieved an F1 score of 52.3% on ABAW 2023 validation set.
Enhanced visual features with super-resolution and face alignment.
Demonstrated the effectiveness of multi-modal features in AU detection.
Abstract
Facial action unit detection has emerged as an important task within facial expression analysis, aimed at detecting specific pre-defined, objective facial expressions, such as lip tightening and cheek raising. This paper presents our submission to the Affective Behavior Analysis in-the-wild (ABAW) 2023 Competition for AU detection. We propose a multi-modal method for facial action unit detection with visual, acoustic, and lexical features extracted from the large pre-trained models. To provide high-quality details for visual feature extraction, we apply super-resolution and face alignment to the training data and show potential performance gain. Our approach achieves the F1 score of 52.3% on the official validation set of the 5th ABAW Challenge.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Emotion and Mood Recognition
