Distract Your Attention: Multi-head Cross Attention Network for Facial Expression Recognition
Zhengyao Wen, Wenzhong Lin, Tao Wang, Ge Xu

TL;DR
This paper introduces DAN, a novel facial expression recognition network that leverages multi-head cross attention to focus on multiple facial regions simultaneously, improving recognition accuracy by capturing subtle and high-order local features.
Contribution
The paper proposes a new multi-component network architecture combining feature clustering, multi-head attention, and attention fusion for improved facial expression recognition.
Findings
Achieves state-of-the-art results on AffectNet, RAF-DB, and SFEW 2.0 datasets.
Demonstrates the effectiveness of multi-head attention in capturing diverse facial regions.
Shows robustness of the proposed method across multiple public datasets.
Abstract
We present a novel facial expression recognition network, called Distract your Attention Network (DAN). Our method is based on two key observations. Firstly, multiple classes share inherently similar underlying facial appearance, and their differences could be subtle. Secondly, facial expressions exhibit themselves through multiple facial regions simultaneously, and the recognition requires a holistic approach by encoding high-order interactions among local features. To address these issues, we propose our DAN with three key components: Feature Clustering Network (FCN), Multi-head cross Attention Network (MAN), and Attention Fusion Network (AFN). The FCN extracts robust features by adopting a large-margin learning objective to maximize class separability. In addition, the MAN instantiates a number of attention heads to simultaneously attend to multiple facial areas and build attention…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Face and Expression Recognition · Face recognition and analysis
MethodsConvolution · Max Pooling · Fully Convolutional Network
