Vision Transformer with Attentive Pooling for Robust Facial Expression   Recognition

Fanglei Xue; Qiangchang Wang; Zichang Tan; Zhongsong Ma; and Guodong; Guo

arXiv:2212.05463·cs.CV·December 13, 2022

Vision Transformer with Attentive Pooling for Robust Facial Expression Recognition

Fanglei Xue, Qiangchang Wang, Zichang Tan, Zhongsong Ma, and Guodong, Guo

PDF

1 Repo

TL;DR

This paper introduces two simple, parameter-free attentive pooling modules for Vision Transformers in facial expression recognition, improving accuracy and efficiency by focusing on discriminative features and reducing noise.

Contribution

The paper proposes novel attentive pooling modules (APP and ATP) that enhance Vision Transformer performance in FER without additional learnable parameters.

Findings

01

Outperforms state-of-the-art on six in-the-wild datasets.

02

Reduces computational cost while boosting discriminative feature focus.

03

Demonstrates effectiveness through qualitative and quantitative analysis.

Abstract

Facial Expression Recognition (FER) in the wild is an extremely challenging task. Recently, some Vision Transformers (ViT) have been explored for FER, but most of them perform inferiorly compared to Convolutional Neural Networks (CNN). This is mainly because the new proposed modules are difficult to converge well from scratch due to lacking inductive bias and easy to focus on the occlusion and noisy areas. TransFER, a representative transformer-based method for FER, alleviates this with multi-branch attention dropping but brings excessive computations. On the contrary, we present two attentive pooling (AP) modules to pool noisy features directly. The AP modules include Attentive Patch Pooling (APP) and Attentive Token Pooling (ATP). They aim to guide the model to emphasize the most discriminative features while reducing the impacts of less relevant features. The proposed APP is employed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

youqingxiaozhua/apvit
paddleOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.