Emotion-LLaMAv2 and MMEVerse: A New Framework and Benchmark for Multimodal Emotion Understanding

Xiaojiang Peng; Jingyi Chen; Zebang Cheng; Bao Peng; Fengyi Wu; Yifei Dong; Shuyuan Tu; Qiyu Hu; Huiting Huang; Yuxiang Lin; Jun-Yan He; Kai Wang; Zheng Lian; Zhi-Qi Cheng

arXiv:2601.16449·cs.CV·February 24, 2026

Emotion-LLaMAv2 and MMEVerse: A New Framework and Benchmark for Multimodal Emotion Understanding

Xiaojiang Peng, Jingyi Chen, Zebang Cheng, Bao Peng, Fengyi Wu, Yifei Dong, Shuyuan Tu, Qiyu Hu, Huiting Huang, Yuxiang Lin, Jun-Yan He, Kai Wang, Zheng Lian, Zhi-Qi Cheng

PDF

Open Access

TL;DR

This paper introduces Emotion-LLaMAv2 and MMEVerse, a comprehensive framework and benchmark for multimodal emotion understanding, enhancing emotion reasoning with new model components and a large-scale, standardized dataset collection.

Contribution

The paper presents novel end-to-end multimodal emotion reasoning models and a large, unified dataset with standardized evaluation, advancing beyond previous limited and low-quality data and explicit face detection methods.

Findings

01

Enhanced emotion reasoning accuracy demonstrated.

02

Unified large-scale dataset improves reproducibility.

03

Novel multimodal fusion techniques outperform prior models.

Abstract

Understanding human emotions from multimodal signals poses a significant challenge in affective computing and human-robot interaction. While multimodal large language models (MLLMs) have excelled in general vision-language tasks, their capabilities in emotional reasoning remain limited. The field currently suffers from a scarcity of large-scale datasets with high-quality, descriptive emotion annotations and lacks standardized benchmarks for evaluation. Our preliminary framework, Emotion-LLaMA, pioneered instruction-tuned multimodal learning for emotion reasoning but was restricted by explicit face detectors, implicit fusion strategies, and low-quality training data with limited scale. To address these limitations, we present Emotion-LLaMAv2 and the MMEVerse benchmark, establishing an end-to-end pipeline together with a standardized evaluation setting for emotion recognition and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition · Social Robot Interaction and HRI · Face recognition and analysis