TL;DR
This paper introduces a novel deep network with global-local attention for emotion recognition, effectively integrating facial and contextual cues to improve accuracy over existing methods.
Contribution
The proposed global-local attention mechanism uniquely combines facial and context features, enhancing emotion recognition performance and interpretability.
Findings
Outperforms state-of-the-art on recent emotion datasets
Produces more meaningful attention maps
Demonstrates improved discrimination of emotional cues
Abstract
Human emotion recognition is an active research area in artificial intelligence and has made substantial progress over the past few years. Many recent works mainly focus on facial regions to infer human affection, while the surrounding context information is not effectively utilized. In this paper, we proposed a new deep network to effectively recognize human emotions using a novel global-local attention mechanism. Our network is designed to extract features from both facial and context regions independently, then learn them together using the attention module. In this way, both the facial and contextual information is used to infer human emotions, therefore enhancing the discrimination of the classifier. The intensive experiments show that our method surpasses the current state-of-the-art methods on recent emotion datasets by a fair margin. Qualitatively, our global-local attention…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsGlobal-Local Attention
