AUFormer: Vision Transformers are Parameter-Efficient Facial Action Unit Detectors
Kaishen Yuan, Zitong Yu, Xin Liu, Weicheng Xie, Huanjing Yue, Jingyu, Yang

TL;DR
AUFormer introduces a parameter-efficient vision transformer-based method for facial action unit detection, utilizing a novel MoKE mechanism and specialized loss to improve accuracy and generalization without extra data.
Contribution
The paper proposes AUFormer with MoKE collaboration and MDWA-Loss, specifically designed for AU detection, achieving state-of-the-art results with fewer parameters and no extra data.
Findings
State-of-the-art performance in AU detection
Robust generalization across domains
Effective data efficiency
Abstract
Facial Action Units (AU) is a vital concept in the realm of affective computing, and AU detection has always been a hot research topic. Existing methods suffer from overfitting issues due to the utilization of a large number of learnable parameters on scarce AU-annotated datasets or heavy reliance on substantial additional relevant data. Parameter-Efficient Transfer Learning (PETL) provides a promising paradigm to address these challenges, whereas its existing methods lack design for AU characteristics. Therefore, we innovatively investigate PETL paradigm to AU detection, introducing AUFormer and proposing a novel Mixture-of-Knowledge Expert (MoKE) collaboration mechanism. An individual MoKE specific to a certain AU with minimal learnable parameters first integrates personalized multi-scale and correlation knowledge. Then the MoKE collaborates with other MoKEs in the expert group to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaze Tracking and Assistive Technology · Visual Attention and Saliency Detection · Face recognition and analysis
MethodsAttention Is All You Need · Byte Pair Encoding · Dropout · Softmax · Dense Connections · Label Smoothing · Adam · Focus · Residual Connection · Absolute Position Encodings
