MVT: Mask Vision Transformer for Facial Expression Recognition in the   wild

Hanting Li; Mingzhe Sui; Feng Zhao; Zhengjun Zha; and Feng Wu

arXiv:2106.04520·cs.CV·July 13, 2021·48 cites

MVT: Mask Vision Transformer for Facial Expression Recognition in the wild

Hanting Li, Mingzhe Sui, Feng Zhao, Zhengjun Zha, and Feng Wu

PDF

Open Access

TL;DR

This paper introduces MVT, a pure transformer-based model for facial expression recognition in challenging wild conditions, utilizing mask generation and label rectification to improve accuracy.

Contribution

The paper proposes a novel Mask Vision Transformer (MVT) with a mask generation network and dynamic relabeling, enhancing FER performance in complex real-world scenarios.

Findings

01

Outperforms state-of-the-art on RAF-DB with 88.62%

02

Achieves 89.22% on FERPlus, surpassing previous methods

03

Attains 64.57% on AffectNet-7 and 61.40% on AffectNet-8

Abstract

Facial Expression Recognition (FER) in the wild is an extremely challenging task in computer vision due to variant backgrounds, low-quality facial images, and the subjectiveness of annotators. These uncertainties make it difficult for neural networks to learn robust features on limited-scale datasets. Moreover, the networks can be easily distributed by the above factors and perform incorrect decisions. Recently, vision transformer (ViT) and data-efficient image transformers (DeiT) present their significant performance in traditional classification tasks. The self-attention mechanism makes transformers obtain a global receptive field in the first layer which dramatically enhances the feature extraction capability. In this work, we first propose a novel pure transformer-based mask vision transformer (MVT) for FER in the wild, which consists of two modules: a transformer-based mask…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition · Advanced Computing and Algorithms · Machine Learning and ELM

MethodsAttention Is All You Need · Linear Layer · Residual Connection · Layer Normalization · Dense Connections · Softmax · Multi-Head Attention · Vision Transformer