FERV39k: A Large-Scale Multi-Scene Dataset for Facial Expression Recognition in Videos
Yan Wang, Yixuan Sun, Yiwen Huang, Zhongying Liu, Shuyong Gao, Wei, Zhang, Weifeng Ge, Wenqiang Zhang

TL;DR
This paper introduces FERV39k, a large-scale multi-scene video dataset for facial expression recognition, addressing the lack of video-based FER benchmarks and analyzing scene-specific performance challenges.
Contribution
The paper presents the creation of FERV39k, a comprehensive multi-scene video dataset for FER, including data collection, annotation process, and baseline evaluations.
Findings
Baseline methods show scene-dependent performance variations.
FERV39k enables systematic analysis of FER in diverse real-world scenes.
Challenges for future FER research identified from benchmark results.
Abstract
Current benchmarks for facial expression recognition (FER) mainly focus on static images, while there are limited datasets for FER in videos. It is still ambiguous to evaluate whether performances of existing methods remain satisfactory in real-world application-oriented scenes. For example, the "Happy" expression with high intensity in Talk-Show is more discriminating than the same expression with low intensity in Official-Event. To fill this gap, we build a large-scale multi-scene dataset, coined as FERV39k. We analyze the important ingredients of constructing such a novel dataset in three aspects: (1) multi-scene hierarchy and expression class, (2) generation of candidate video clips, (3) trusted manual labelling process. Based on these guidelines, we select 4 scenarios subdivided into 22 scenes, annotate 86k samples automatically obtained from 4k videos based on the well-designed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Gaze Tracking and Assistive Technology · Face recognition and analysis
