EmoBench-M: Benchmarking Emotional Intelligence for Multimodal Large Language Models

He Hu; Lianzhong You; Hongbo Xu; Qianning Wang; Fei Richard Yu; Fei Ma; Zebang Cheng; Zheng Lian; Yucheng Zhou; Laizhong Cui

arXiv:2502.04424·cs.CL·April 28, 2026

EmoBench-M: Benchmarking Emotional Intelligence for Multimodal Large Language Models

He Hu, Lianzhong You, Hongbo Xu, Qianning Wang, Fei Richard Yu, Fei Ma, Zebang Cheng, Zheng Lian, Yucheng Zhou, Laizhong Cui

PDF

2 Repos 1 Datasets

TL;DR

EmoBench-M is a comprehensive benchmark designed to evaluate multimodal large language models' emotional intelligence across diverse scenarios, revealing significant performance gaps and guiding future improvements.

Contribution

The paper introduces EmoBench-M, a new psychological theory-based benchmark for assessing MLLMs' emotional intelligence in multimodal, dynamic contexts.

Findings

01

Top models like Gemini-3.0-Pro and GPT-5.2 score 70.5 and 66.5 respectively on EmoBench-M.

02

Specialized models such as AffectGPT show uneven performance across different emotional scenarios.

03

There is a substantial gap between current MLLMs and human-level emotional intelligence.

Abstract

With the integration of multimodal large language models (MLLMs) into robotic systems and AI applications, embedding emotional intelligence (EI) capabilities is essential for enabling these models to perceive, interpret, and respond to human emotions effectively in real-world scenarios. Existing static, text-based, or text-image benchmarks overlook the multimodal complexities of real interactions and fail to capture the dynamic, context-dependent nature of emotional expressions, rendering them inadequate for evaluating MLLMs' EI capabilities. To address these limitations, we introduce EmoBench-M, a systematic benchmark grounded in established psychological theories, designed to evaluate MLLMs across 13 evaluation scenarios spanning three hierarchical dimensions: foundational emotion recognition (FER), conversational emotion understanding (CEU), and socially complex emotion analysis…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Datasets

GMLHUHE/Emobench-M
dataset· 73 dl
73 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.