Multi-label Transformer for Action Unit Detection
Gauthier Tallec, Edouard Yvinec, Arnaud Dapogny, Kevin Bailly

TL;DR
This paper introduces a multi-label detection transformer utilizing multi-head attention for facial Action Unit detection, leveraging large annotated datasets to improve recognition accuracy.
Contribution
It presents a novel transformer-based approach for AU detection that effectively identifies relevant facial regions for each action unit.
Findings
Improved AU detection accuracy on ABAW dataset
Effective use of multi-head attention for facial region relevance
Successful submission to ABAW3 challenge
Abstract
Action Unit (AU) Detection is the branch of affective computing that aims at recognizing unitary facial muscular movements. It is key to unlock unbiased computational face representations and has therefore aroused great interest in the past few years. One of the main obstacles toward building efficient deep learning based AU detection system is the lack of wide facial image databases annotated by AU experts. In that extent the ABAW challenge paves the way toward better AU detection as it involves a 2M frames AU annotated dataset. In this paper, we present our submission to the ABAW3 challenge. In a nutshell, we applied a multi-label detection transformer that leverage multi-head attention to learn which part of the face image is the most relevant to predict each AU.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Gaze Tracking and Assistive Technology
MethodsSoftmax · Linear Layer
