Facial Expression Recognition with Visual Transformers and Attentional Selective Fusion
Fuyan Ma, Bin Sun, Shutao Li

TL;DR
This paper introduces a novel facial expression recognition method using visual transformers and attentional feature fusion, effectively handling occlusions and pose variations in unconstrained environments, and achieving state-of-the-art results.
Contribution
It proposes the Visual Transformers with Feature Fusion (VTFF) framework, combining attentional selective fusion with global self-attention for improved FER in the wild.
Findings
Achieved new state-of-the-art accuracy on RAF-DB, FERPlus, and AffectNet datasets.
Demonstrated superior performance and generalization capability across multiple in-the-wild datasets.
Validated effectiveness of the proposed method through extensive experiments.
Abstract
Facial Expression Recognition (FER) in the wild is extremely challenging due to occlusions, variant head poses, face deformation and motion blur under unconstrained conditions. Although substantial progresses have been made in automatic FER in the past few decades, previous studies were mainly designed for lab-controlled FER. Real-world occlusions, variant head poses and other issues definitely increase the difficulty of FER on account of these information-deficient regions and complex backgrounds. Different from previous pure CNNs based methods, we argue that it is feasible and practical to translate facial images into sequences of visual words and perform expression recognition from a global perspective. Therefore, we propose the Visual Transformers with Feature Fusion (VTFF) to tackle FER in the wild by two main steps. First, we propose the attentional selective fusion (ASF) for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
