EmoLLM: Multimodal Emotional Understanding Meets Large Language Models

Qu Yang; Mang Ye; Bo Du

arXiv:2406.16442·cs.CV·July 2, 2024·2 cites

EmoLLM: Multimodal Emotional Understanding Meets Large Language Models

Qu Yang, Mang Ye, Bo Du

PDF

Open Access 1 Repo

TL;DR

EmoLLM introduces a novel multimodal model and benchmark for understanding complex human emotions in images and videos, significantly improving emotional comprehension in large language models.

Contribution

The paper presents EmoLLM and EmoBench, pioneering tools for evaluating and enhancing emotional understanding in multimodal large language models.

Findings

01

EmoLLM improves emotional understanding performance by 12.1%.

02

Multi-perspective Visual Projection captures diverse emotional cues.

03

EmoPrompt effectively guides emotion reasoning.

Abstract

Multi-modal large language models (MLLMs) have achieved remarkable performance on objective multimodal perception tasks, but their ability to interpret subjective, emotionally nuanced multimodal content remains largely unexplored. Thus, it impedes their ability to effectively understand and react to the intricate emotions expressed by humans through multimodal media. To bridge this gap, we introduce EmoBench, the first comprehensive benchmark designed specifically to evaluate the emotional capabilities of MLLMs across five popular emotional tasks, using a diverse dataset of 287k images and videos paired with corresponding textual instructions. Meanwhile, we propose EmoLLM, a novel model for multimodal emotional understanding, incorporating with two core techniques. 1) Multi-perspective Visual Projection, it captures diverse emotional cues from visual data from multiple perspectives. 2)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yan9qu/emollm
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems