Compound Expression Recognition via Multi Model Ensemble for the ABAW7 Challenge
Xuxiong Liu, Kang Shen, Jun Yao, Boyan Wang, Minrui Liu, Liuwei An,, Zishun Cui, Weijie Feng, Xiao Sun

TL;DR
This paper presents an ensemble learning approach combining convolutional networks, Vision Transformers, and local attention networks for compound expression recognition, achieving high accuracy and zero-shot capabilities on facial expression datasets.
Contribution
It introduces a multi-model ensemble method for CER that effectively captures local and global facial cues, advancing recognition accuracy and zero-shot learning.
Findings
High accuracy on RAF-DB dataset
Effective zero-shot recognition on C-EXPR-DB
Ensemble improves expression classification performance
Abstract
Compound Expression Recognition (CER) is vital for effective interpersonal interactions. Human emotional expressions are inherently complex due to the presence of compound expressions, requiring the consideration of both local and global facial cues for accurate judgment. In this paper, we propose an ensemble learning-based solution to address this complexity. Our approach involves training three distinct expression classification models using convolutional networks, Vision Transformers, and multiscale local attention networks. By employing late fusion for model ensemble, we combine the outputs of these models to predict the final results. Our method demonstrates high accuracy on the RAF-DB datasets and is capable of recognizing expressions in certain portions of the C-EXPR-DB through zero-shot learning.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsChemical Synthesis and Analysis · Machine Learning in Bioinformatics
MethodsSoftmax · Attention Is All You Need
