Facial Affect Recognition based on Multi Architecture Encoder and   Feature Fusion for the ABAW7 Challenge

Kang Shen; Xuxiong Liu; Boyan Wang; Jun Yao; Xin Liu; Yujie Guan; Yu; Wang; Gengchen Li; Xiao Sun

arXiv:2407.12258·cs.CV·July 29, 2024

Facial Affect Recognition based on Multi Architecture Encoder and Feature Fusion for the ABAW7 Challenge

Kang Shen, Xuxiong Liu, Boyan Wang, Jun Yao, Xin Liu, Yujie Guan, Yu, Wang, Gengchen Li, Xiao Sun

PDF

Open Access

TL;DR

This paper introduces a multi-architecture encoder with feature fusion and an affine module to improve facial affect recognition across three ABAW7 sub-challenges, achieving significant performance gains.

Contribution

The novel integration of a Transformer Encoder with feature fusion and an affine module for better feature alignment in facial affect recognition.

Findings

01

Significant performance improvements over baselines.

02

Effective feature alignment across different modalities.

03

Successful application to ABAW7 challenges.

Abstract

In this paper, we present our approach to addressing the challenges of the 7th ABAW competition. The competition comprises three sub-challenges: Valence Arousal (VA) estimation, Expression (Expr) classification, and Action Unit (AU) detection. To tackle these challenges, we employ state-of-the-art models to extract powerful visual features. Subsequently, a Transformer Encoder is utilized to integrate these features for the VA, Expr, and AU sub-challenges. To mitigate the impact of varying feature dimensions, we introduce an affine module to align the features to a common dimension. Overall, our results significantly outperform the baselines.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition

MethodsAttention Is All You Need · Residual Connection · Byte Pair Encoding · Layer Normalization · Label Smoothing · Linear Layer · Adam · Dropout · Multi-Head Attention · Dense Connections