Facial Action Unit Detection and Intensity Estimation from   Self-supervised Representation

Bowen Ma; Rudong An; Wei Zhang; Yu Ding; Zeng Zhao; Rongsheng Zhang,; Tangjie Lv; Changjie Fan; Zhipeng Hu

arXiv:2210.15878·cs.CV·October 31, 2022·6 cites

Facial Action Unit Detection and Intensity Estimation from Self-supervised Representation

Bowen Ma, Rudong An, Wei Zhang, Yu Ding, Zeng Zhao, Rongsheng Zhang,, Tangjie Lv, Changjie Fan, Zhipeng Hu

PDF

Open Access

TL;DR

This paper presents MAE-Face, a self-supervised facial representation model that significantly improves AU detection and intensity estimation, especially under limited labeled data, by leveraging masked autoencoding pre-training.

Contribution

Introducing MAE-Face, a novel self-supervised pre-training approach for facial action unit analysis that reduces dependency on manual annotations and enhances robustness.

Findings

01

Achieves state-of-the-art performance on AU detection and intensity estimation.

02

Performs well even with only 1% of labeled training data.

03

Demonstrates strong generalization across datasets.

Abstract

As a fine-grained and local expression behavior measurement, facial action unit (FAU) analysis (e.g., detection and intensity estimation) has been documented for its time-consuming, labor-intensive, and error-prone annotation. Thus a long-standing challenge of FAU analysis arises from the data scarcity of manual annotations, limiting the generalization ability of trained models to a large extent. Amounts of previous works have made efforts to alleviate this issue via semi/weakly supervised methods and extra auxiliary information. However, these methods still require domain knowledge and have not yet avoided the high dependency on data annotation. This paper introduces a robust facial representation model MAE-Face for AU analysis. Using masked autoencoding as the self-supervised pre-training approach, MAE-Face first learns a high-capacity model from a feasible collection of face images…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition · Speech and Audio Processing · Gaze Tracking and Assistive Technology