Facial Action Unit Detection and Intensity Estimation from Self-supervised Representation
Bowen Ma, Rudong An, Wei Zhang, Yu Ding, Zeng Zhao, Rongsheng Zhang,, Tangjie Lv, Changjie Fan, Zhipeng Hu

TL;DR
This paper presents MAE-Face, a self-supervised facial representation model that significantly improves AU detection and intensity estimation, especially under limited labeled data, by leveraging masked autoencoding pre-training.
Contribution
Introducing MAE-Face, a novel self-supervised pre-training approach for facial action unit analysis that reduces dependency on manual annotations and enhances robustness.
Findings
Achieves state-of-the-art performance on AU detection and intensity estimation.
Performs well even with only 1% of labeled training data.
Demonstrates strong generalization across datasets.
Abstract
As a fine-grained and local expression behavior measurement, facial action unit (FAU) analysis (e.g., detection and intensity estimation) has been documented for its time-consuming, labor-intensive, and error-prone annotation. Thus a long-standing challenge of FAU analysis arises from the data scarcity of manual annotations, limiting the generalization ability of trained models to a large extent. Amounts of previous works have made efforts to alleviate this issue via semi/weakly supervised methods and extra auxiliary information. However, these methods still require domain knowledge and have not yet avoided the high dependency on data annotation. This paper introduces a robust facial representation model MAE-Face for AU analysis. Using masked autoencoding as the self-supervised pre-training approach, MAE-Face first learns a high-capacity model from a feasible collection of face images…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Speech and Audio Processing · Gaze Tracking and Assistive Technology
