EventFormer: AU Event Transformer for Facial Action Unit Event Detection
Yingjie Chen, Jiarui Zhang, Tao Wang, and Yun Liang

TL;DR
EventFormer is a novel transformer-based model designed to detect facial action unit events directly from video sequences, effectively capturing the dynamic nature of AUs for improved emotion analysis.
Contribution
This work introduces the first approach to directly detect AU events from videos using a transformer model, emphasizing temporal information for better accuracy.
Findings
Outperforms previous methods on BP4D dataset
Effectively captures dynamic AU processes
Demonstrates the importance of temporal information in AU detection
Abstract
Facial action units (AUs) play an indispensable role in human emotion analysis. We observe that although AU-based high-level emotion analysis is urgently needed by real-world applications, frame-level AU results provided by previous works cannot be directly used for such analysis. Moreover, as AUs are dynamic processes, the utilization of global temporal information is important but has been gravely ignored in the literature. To this end, we propose EventFormer for AU event detection, which is the first work directly detecting AU events from a video sequence by viewing AU event detection as a multiple class-specific sets prediction problem. Extensive experiments conducted on a commonly used AU benchmark dataset, BP4D, show the superiority of EventFormer under suitable metrics.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Human Pose and Action Recognition · Face recognition and analysis
