Facial Expression Recognition using Squeeze and Excitation-powered Swin Transformers
Arpita Vats, Aman Chadha

TL;DR
This paper introduces an efficient facial emotion recognition model using Swin Transformers with squeeze and excitation blocks, achieving superior performance on a challenging dataset with minimal data.
Contribution
It presents a novel FER framework combining Swin Vision Transformers and SE blocks to improve efficiency and accuracy with limited data.
Findings
Achieved an F1-score of 0.5420 on AffectNet dataset.
Surpassed the performance of the ABAW 2022 competition winner.
Demonstrated effectiveness of SE blocks in transformer-based FER models.
Abstract
The ability to recognize and interpret facial emotions is a critical component of human communication, as it allows individuals to understand and respond to emotions conveyed through facial expressions and vocal tones. The recognition of facial emotions is a complex cognitive process that involves the integration of visual and auditory information, as well as prior knowledge and social cues. It plays a crucial role in social interaction, affective processing, and empathy, and is an important aspect of many real-world applications, including human-computer interaction, virtual assistants, and mental health diagnosis and treatment. The development of accurate and efficient models for facial emotion recognition is therefore of great importance and has the potential to have a significant impact on various fields of study.The field of Facial Emotion Recognition (FER) is of great significance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Gaze Tracking and Assistive Technology · EEG and Brain-Computer Interfaces
MethodsAttention Is All You Need · Dense Connections · Softmax · Layer Normalization · Linear Layer · Multi-Head Attention · Residual Connection · Vision Transformer
