Emotion-LLaMAv2 and MMEVerse: A New Framework and Benchmark for Multimodal Emotion Understanding
Xiaojiang Peng, Jingyi Chen, Zebang Cheng, Bao Peng, Fengyi Wu, Yifei Dong, Shuyuan Tu, Qiyu Hu, Huiting Huang, Yuxiang Lin, Jun-Yan He, Kai Wang, Zheng Lian, Zhi-Qi Cheng

TL;DR
This paper introduces Emotion-LLaMAv2 and MMEVerse, a comprehensive framework and benchmark for multimodal emotion understanding, enhancing emotion reasoning with new model components and a large-scale, standardized dataset collection.
Contribution
The paper presents novel end-to-end multimodal emotion reasoning models and a large, unified dataset with standardized evaluation, advancing beyond previous limited and low-quality data and explicit face detection methods.
Findings
Enhanced emotion reasoning accuracy demonstrated.
Unified large-scale dataset improves reproducibility.
Novel multimodal fusion techniques outperform prior models.
Abstract
Understanding human emotions from multimodal signals poses a significant challenge in affective computing and human-robot interaction. While multimodal large language models (MLLMs) have excelled in general vision-language tasks, their capabilities in emotional reasoning remain limited. The field currently suffers from a scarcity of large-scale datasets with high-quality, descriptive emotion annotations and lacks standardized benchmarks for evaluation. Our preliminary framework, Emotion-LLaMA, pioneered instruction-tuned multimodal learning for emotion reasoning but was restricted by explicit face detectors, implicit fusion strategies, and low-quality training data with limited scale. To address these limitations, we present Emotion-LLaMAv2 and the MMEVerse benchmark, establishing an end-to-end pipeline together with a standardized evaluation setting for emotion recognition and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Social Robot Interaction and HRI · Face recognition and analysis
