AU-vMAE: Knowledge-Guide Action Units Detection via Video Masked   Autoencoder

Qiaoqiao Jin; Rui Shi; Yishun Dou; Bingbing Ni

arXiv:2407.11468·cs.CV·July 17, 2024

AU-vMAE: Knowledge-Guide Action Units Detection via Video Masked Autoencoder

Qiaoqiao Jin, Rui Shi, Yishun Dou, Bingbing Ni

PDF

Open Access

TL;DR

AU-vMAE introduces a video-masked autoencoder pre-training scheme for facial action unit detection, leveraging multi-label video data and temporal consistency to improve performance over existing methods.

Contribution

The paper proposes a novel video-level pre-training approach using masked autoencoders and prior AU pair matrices, addressing data scarcity and diversity in facial action unit detection.

Findings

01

Significant performance improvements on BP4D and DISFA datasets.

02

Effective utilization of multi-label and temporal information.

03

Outperforms state-of-the-art methods in AU detection.

Abstract

Current Facial Action Unit (FAU) detection methods generally encounter difficulties due to the scarcity of labeled video training data and the limited number of training face IDs, which renders the trained feature extractor insufficient coverage for modeling the large diversity of inter-person facial structures and movements. To explicitly address the above challenges, we propose a novel video-level pre-training scheme by fully exploring the multi-label property of FAUs in the video as well as the temporal label consistency. At the heart of our design is a pre-trained video feature extractor based on the video-masked autoencoder together with a fine-tuning network that jointly completes the multi-level video FAUs analysis tasks, \emph{i.e.} integrating both video-level and frame-level FAU detections, thus dramatically expanding the supervision set from sparse FAUs annotations to ALL…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications

MethodsSparse Evolutionary Training