Facial Affect Recognition based on Multi Architecture Encoder and Feature Fusion for the ABAW7 Challenge
Kang Shen, Xuxiong Liu, Boyan Wang, Jun Yao, Xin Liu, Yujie Guan, Yu, Wang, Gengchen Li, Xiao Sun

TL;DR
This paper introduces a multi-architecture encoder with feature fusion and an affine module to improve facial affect recognition across three ABAW7 sub-challenges, achieving significant performance gains.
Contribution
The novel integration of a Transformer Encoder with feature fusion and an affine module for better feature alignment in facial affect recognition.
Findings
Significant performance improvements over baselines.
Effective feature alignment across different modalities.
Successful application to ABAW7 challenges.
Abstract
In this paper, we present our approach to addressing the challenges of the 7th ABAW competition. The competition comprises three sub-challenges: Valence Arousal (VA) estimation, Expression (Expr) classification, and Action Unit (AU) detection. To tackle these challenges, we employ state-of-the-art models to extract powerful visual features. Subsequently, a Transformer Encoder is utilized to integrate these features for the VA, Expr, and AU sub-challenges. To mitigate the impact of varying feature dimensions, we introduce an affine module to align the features to a common dimension. Overall, our results significantly outperform the baselines.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition
MethodsAttention Is All You Need · Residual Connection · Byte Pair Encoding · Layer Normalization · Label Smoothing · Linear Layer · Adam · Dropout · Multi-Head Attention · Dense Connections
