Multimodal Large Language Models for End-to-End Affective Computing: Benchmarking and Boosting with Generative Knowledge Prompting

Miaosen Luo; Jiesen Long; Zequn Li; Yunying Yang; Yuncheng Jiang; Sijie Mai

arXiv:2508.02429·cs.AI·August 5, 2025

Multimodal Large Language Models for End-to-End Affective Computing: Benchmarking and Boosting with Generative Knowledge Prompting

Miaosen Luo, Jiesen Long, Zequn Li, Yunying Yang, Yuncheng Jiang, Sijie Mai

PDF

Open Access

TL;DR

This paper benchmarks multimodal large language models for affective computing, analyzes their performance variability, and introduces a generative knowledge prompting strategy that significantly boosts their ability to interpret human emotions.

Contribution

It provides a systematic evaluation of open-source MLLMs on MAC tasks and proposes a novel hybrid prompting and fine-tuning method to enhance their performance.

Findings

01

Benchmarking reveals performance gaps across models and datasets.

02

Hybrid prompting and fine-tuning improve MAC task accuracy.

03

Insights into architecture and data influence on affective analysis.

Abstract

Multimodal Affective Computing (MAC) aims to recognize and interpret human emotions by integrating information from diverse modalities such as text, video, and audio. Recent advancements in Multimodal Large Language Models (MLLMs) have significantly reshaped the landscape of MAC by offering a unified framework for processing and aligning cross-modal information. However, practical challenges remain, including performance variability across complex MAC tasks and insufficient understanding of how architectural designs and data characteristics impact affective analysis. To address these gaps, we conduct a systematic benchmark evaluation of state-of-the-art open-source MLLMs capable of concurrently processing audio, visual, and textual modalities across multiple established MAC datasets. Our evaluation not only compares the performance of these MLLMs but also provides actionable insights…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition · Sentiment Analysis and Opinion Mining · Multimodal Machine Learning Applications