Multimodal Emotion Recognition with Large Language Models
Hongrui Zhang, Daiqing Wu, Yangyang Li, Kuien Liu, Yuhui Wang, Yu Zhou, Sicheng Zhao

TL;DR
This paper reviews the emerging paradigm of using Large Language Models for Multimodal Emotion Recognition, highlighting challenges, research directions, and future prospects in the field.
Contribution
It systematically categorizes existing research on MER-with-LLMs, providing a comprehensive map of development, trends, and remaining issues.
Findings
Identification of key challenges like data scarcity and affective gaps.
Categorization of research into three main directions.
Analysis of emerging trends and future research opportunities.
Abstract
Multimodal Emotion Recognition (MER) focuses on identifying and interpreting emotions from modality-compound inputs. Closely mirroring human cognitive processes in real-world environments, MER has drawn substantial attention from both academia and industry. Recently, a paradigm shift has been unveiled in MER, from leveraging small-scale, task-specific models to Large Language Models (LLMs). We refer to the latter as the MER-with-LLMs paradigm, which offers unprecedented generality, spurring numerous empirical attempts, even alongside speculation about LLMs' potential to achieve general emotional intelligence. However, with these new opportunities come new challenges, including the scarcity of emotionally annotated data, the affective gap both within and across modalities, and the opacity of affective interpretation. To systematically review existing research and guide future…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
