AUFormer: Vision Transformers are Parameter-Efficient Facial Action Unit   Detectors

Kaishen Yuan; Zitong Yu; Xin Liu; Weicheng Xie; Huanjing Yue; Jingyu; Yang

arXiv:2403.04697·cs.CV·July 10, 2024·1 cites

AUFormer: Vision Transformers are Parameter-Efficient Facial Action Unit Detectors

Kaishen Yuan, Zitong Yu, Xin Liu, Weicheng Xie, Huanjing Yue, Jingyu, Yang

PDF

Open Access 1 Repo

TL;DR

AUFormer introduces a parameter-efficient vision transformer-based method for facial action unit detection, utilizing a novel MoKE mechanism and specialized loss to improve accuracy and generalization without extra data.

Contribution

The paper proposes AUFormer with MoKE collaboration and MDWA-Loss, specifically designed for AU detection, achieving state-of-the-art results with fewer parameters and no extra data.

Findings

01

State-of-the-art performance in AU detection

02

Robust generalization across domains

03

Effective data efficiency

Abstract

Facial Action Units (AU) is a vital concept in the realm of affective computing, and AU detection has always been a hot research topic. Existing methods suffer from overfitting issues due to the utilization of a large number of learnable parameters on scarce AU-annotated datasets or heavy reliance on substantial additional relevant data. Parameter-Efficient Transfer Learning (PETL) provides a promising paradigm to address these challenges, whereas its existing methods lack design for AU characteristics. Therefore, we innovatively investigate PETL paradigm to AU detection, introducing AUFormer and proposing a novel Mixture-of-Knowledge Expert (MoKE) collaboration mechanism. An individual MoKE specific to a certain AU with minimal learnable parameters first integrates personalized multi-scale and correlation knowledge. Then the MoKE collaborates with other MoKEs in the expert group to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yuankaishen2001/auformer
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGaze Tracking and Assistive Technology · Visual Attention and Saliency Detection · Face recognition and analysis

MethodsAttention Is All You Need · Byte Pair Encoding · Dropout · Softmax · Dense Connections · Label Smoothing · Adam · Focus · Residual Connection · Absolute Position Encodings