An Ensemble Approach for Multiple Emotion Descriptors Estimation Using Multi-task Learning
Irfan Haider, Minh-Trieu Tran, Soo-Hyung Kim, Hyung-Jeong Yang,, Guee-Sang Lee

TL;DR
This paper presents a multi-task learning model that combines face and contextual information using deep features, attention, and transformer mechanisms to estimate multiple emotion descriptors simultaneously, achieving high accuracy.
Contribution
The novel approach integrates full face and contextual data with advanced deep learning components for multi-emotion estimation in a single framework.
Findings
Achieved 0.917 performance on validation dataset
Simultaneously predicts arousal, valence, expression, and action units
Utilizes attention and transformer for feature refinement
Abstract
This paper illustrates our submission method to the fourth Affective Behavior Analysis in-the-Wild (ABAW) Competition. The method is used for the Multi-Task Learning Challenge. Instead of using only face information, we employ full information from a provided dataset containing face and the context around the face. We utilized the InceptionNet V3 model to extract deep features then we applied the attention mechanism to refine the features. After that, we put those features into the transformer block and multi-layer perceptron networks to get the final multiple kinds of emotion. Our model predicts arousal and valence, classifies the emotional expression and estimates the action units simultaneously. The proposed system achieves the performance of 0.917 on the MTL Challenge validation dataset.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Human Pose and Action Recognition · Sentiment Analysis and Opinion Mining
