Pioneering Multimodal Emotion Recognition in the Era of Large Models: From Closed Sets to Open Vocabularies
Jing Han, Zhiqiang Gao, Shihao Gao, Jialing Liu, Hongyu Chen, Zixing Zhang, Bj\"orn W. Schuller

TL;DR
This paper conducts the first large-scale benchmarking of multimodal large language models for open-vocabulary emotion recognition, analyzing their reasoning, fusion strategies, and modality importance to guide future emotion AI development.
Contribution
It introduces a comprehensive evaluation of 19 MLLMs on MER-OV, revealing optimal fusion strategies and the significance of video modality, and provides practical benchmarks and insights for advancing emotion recognition.
Findings
Two-stage trimodal fusion yields best performance.
Video modality is most critical for emotion recognition.
Narrow performance gap between open- and closed-source LLMs.
Abstract
Recent advances in multimodal large language models (MLLMs) have demonstrated remarkable multi- and cross-modal integration capabilities. However, their potential for fine-grained emotion understanding remains systematically underexplored. While open-vocabulary multimodal emotion recognition (MER-OV) has emerged as a promising direction to overcome the limitations of closed emotion sets, no comprehensive evaluation of MLLMs in this context currently exists. To address this, our work presents the first large-scale benchmarking study of MER-OV on the OV-MERD dataset, evaluating 19 mainstream MLLMs, including general-purpose, modality-specialized, and reasoning-enhanced architectures. Through systematic analysis of model reasoning capacity, fusion strategies, contextual utilization, and prompt design, we provide key insights into the capabilities and limitations of current MLLMs for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Sentiment Analysis and Opinion Mining · Mental Health via Writing
