Facial Expression Recognition with Visual Transformers and Attentional   Selective Fusion

Fuyan Ma; Bin Sun; Shutao Li

arXiv:2103.16854·cs.CV·May 12, 2022

Facial Expression Recognition with Visual Transformers and Attentional Selective Fusion

Fuyan Ma, Bin Sun, Shutao Li

PDF

TL;DR

This paper introduces a novel facial expression recognition method using visual transformers and attentional feature fusion, effectively handling occlusions and pose variations in unconstrained environments, and achieving state-of-the-art results.

Contribution

It proposes the Visual Transformers with Feature Fusion (VTFF) framework, combining attentional selective fusion with global self-attention for improved FER in the wild.

Findings

01

Achieved new state-of-the-art accuracy on RAF-DB, FERPlus, and AffectNet datasets.

02

Demonstrated superior performance and generalization capability across multiple in-the-wild datasets.

03

Validated effectiveness of the proposed method through extensive experiments.

Abstract

Facial Expression Recognition (FER) in the wild is extremely challenging due to occlusions, variant head poses, face deformation and motion blur under unconstrained conditions. Although substantial progresses have been made in automatic FER in the past few decades, previous studies were mainly designed for lab-controlled FER. Real-world occlusions, variant head poses and other issues definitely increase the difficulty of FER on account of these information-deficient regions and complex backgrounds. Different from previous pure CNNs based methods, we argue that it is feasible and practical to translate facial images into sequences of visual words and perform expression recognition from a global perspective. Therefore, we propose the Visual Transformers with Feature Fusion (VTFF) to tackle FER in the wild by two main steps. First, we propose the attentional selective fusion (ASF) for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.