EMO-LLaMA: Enhancing Facial Emotion Understanding with Instruction Tuning
Bohao Xing, Zitong Yu, Xin Liu, Kaishen Yuan, Qilang Ye, Weicheng Xie,, Huanjing Yue, Jingyu Yang, Heikki K\"alvi\"ainen

TL;DR
EMO-LLaMA is a novel multimodal large language model designed to improve facial emotion recognition by integrating facial priors, semantic instruction tuning, and demographic attributes, achieving state-of-the-art performance.
Contribution
The paper introduces EMO-LLaMA, a new MLLM that incorporates facial priors and demographic attributes through instruction tuning for enhanced FER performance.
Findings
EMO-LLaMA achieves SOTA or competitive results on static and dynamic FER datasets.
Instruction data generated for five FER datasets improves model understanding.
Incorporating facial priors and demographic attributes enhances emotion recognition accuracy.
Abstract
Facial expression recognition (FER) is an important research topic in emotional artificial intelligence. In recent decades, researchers have made remarkable progress. However, current FER paradigms face challenges in generalization, lack semantic information aligned with natural language, and struggle to process both images and videos within a unified framework, making their application in multimodal emotion understanding and human-computer interaction difficult. Multimodal Large Language Models (MLLMs) have recently achieved success, offering advantages in addressing these issues and potentially overcoming the limitations of current FER paradigms. However, directly applying pre-trained MLLMs to FER still faces several challenges. Our zero-shot evaluations of existing open-source MLLMs on FER indicate a significant performance gap compared to GPT-4V and current supervised…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Emotion and Mood Recognition
MethodsINFO: An Efficient Optimization Algorithm based on Weighted Mean of Vectors
