EMO-LLaMA: Enhancing Facial Emotion Understanding with Instruction   Tuning

Bohao Xing; Zitong Yu; Xin Liu; Kaishen Yuan; Qilang Ye; Weicheng Xie,; Huanjing Yue; Jingyu Yang; Heikki K\"alvi\"ainen

arXiv:2408.11424·cs.CV·August 22, 2024·2 cites

EMO-LLaMA: Enhancing Facial Emotion Understanding with Instruction Tuning

Bohao Xing, Zitong Yu, Xin Liu, Kaishen Yuan, Qilang Ye, Weicheng Xie,, Huanjing Yue, Jingyu Yang, Heikki K\"alvi\"ainen

PDF

Open Access 1 Repo

TL;DR

EMO-LLaMA is a novel multimodal large language model designed to improve facial emotion recognition by integrating facial priors, semantic instruction tuning, and demographic attributes, achieving state-of-the-art performance.

Contribution

The paper introduces EMO-LLaMA, a new MLLM that incorporates facial priors and demographic attributes through instruction tuning for enhanced FER performance.

Findings

01

EMO-LLaMA achieves SOTA or competitive results on static and dynamic FER datasets.

02

Instruction data generated for five FER datasets improves model understanding.

03

Incorporating facial priors and demographic attributes enhances emotion recognition accuracy.

Abstract

Facial expression recognition (FER) is an important research topic in emotional artificial intelligence. In recent decades, researchers have made remarkable progress. However, current FER paradigms face challenges in generalization, lack semantic information aligned with natural language, and struggle to process both images and videos within a unified framework, making their application in multimodal emotion understanding and human-computer interaction difficult. Multimodal Large Language Models (MLLMs) have recently achieved success, offering advantages in addressing these issues and potentially overcoming the limitations of current FER paradigms. However, directly applying pre-trained MLLMs to FER still faces several challenges. Our zero-shot evaluations of existing open-source MLLMs on FER indicate a significant performance gap compared to GPT-4V and current supervised…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xxtars/emo-llama
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Emotion and Mood Recognition

MethodsINFO: An Efficient Optimization Algorithm based on Weighted Mean of Vectors