Vision Transformer Equipped with Neural Resizer on Facial Expression   Recognition Task

Hyeonbin Hwang; Soyeon Kim; Wei-Jin Park; Jiho Seo; Kyungtae Ko; Hyeon; Yeo

arXiv:2204.02181·cs.CV·April 6, 2022

Vision Transformer Equipped with Neural Resizer on Facial Expression Recognition Task

Hyeonbin Hwang, Soyeon Kim, Wei-Jin Park, Jiho Seo, Kyungtae Ko, Hyeon, Yeo

PDF

Open Access

TL;DR

This paper introduces Neural Resizer, a data-driven approach to improve Vision Transformers for facial expression recognition in low-quality, imbalanced data scenarios, achieving near state-of-the-art results.

Contribution

The paper proposes Neural Resizer, a novel training framework that enhances Vision Transformers by compensating for low-resolution data in facial expression recognition tasks.

Findings

01

Neural Resizer improves Transformer performance on facial expression data.

02

The approach nearly achieves state-of-the-art accuracy.

03

Data-driven resizing outperforms traditional interpolation methods.

Abstract

When it comes to wild conditions, Facial Expression Recognition is often challenged with low-quality data and imbalanced, ambiguous labels. This field has much benefited from CNN based approaches; however, CNN models have structural limitation to see the facial regions in distant. As a remedy, Transformer has been introduced to vision fields with global receptive field, but requires adjusting input spatial size to the pretrained models to enjoy their strong inductive bias at hands. We herein raise a question whether using the deterministic interpolation method is enough to feed low-resolution data to Transformer. In this work, we propose a novel training framework, Neural Resizer, to support Transformer by compensating information and downscaling in a data-driven manner trained with loss function balancing the noisiness and imbalance. Experiments show our Neural Resizer with F-PDLS loss…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHand Gesture Recognition Systems · Emotion and Mood Recognition · Face and Expression Recognition

MethodsAttention Is All You Need · Linear Layer · Dropout · Layer Normalization · Label Smoothing · Softmax · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Dense Connections · Multi-Head Attention