Multimodal LLMs See Sentiment

Neemias B. da Silva; John Harrison; Rodrigo Minetto; Myriam R. Delgado; Bogdan T. Nassu; Thiago H. Silva

arXiv:2508.16873·cs.CV·December 3, 2025

Multimodal LLMs See Sentiment

Neemias B. da Silva, John Harrison, Rodrigo Minetto, Myriam R. Delgado, Bogdan T. Nassu, Thiago H. Silva

PDF

Open Access

TL;DR

This paper introduces MLLMsent, a framework for evaluating multimodal large language models' ability to understand sentiment from images, achieving state-of-the-art results through fine-tuning and cross-dataset testing.

Contribution

The paper presents a novel framework for sentiment reasoning in multimodal models, including direct classification, description-based analysis, and fine-tuning, with significant performance improvements.

Findings

01

State-of-the-art sentiment classification accuracy

02

Up to 30.9% improvement over baselines

03

Effective cross-dataset generalization without retraining

Abstract

Understanding how visual content communicates sentiment is critical in an era where online interaction is increasingly dominated by this kind of media on social platforms. However, this remains a challenging problem, as sentiment perception is closely tied to complex, scene-level semantics. In this paper, we propose an original framework, MLLMsent, to investigate the sentiment reasoning capabilities of Multimodal Large Language Models (MLLMs) through three perspectives: (1) using those MLLMs for direct sentiment classification from images; (2) associating them with pre-trained LLMs for sentiment analysis on automatically generated image descriptions; and (3) fine-tuning the LLMs on sentiment-labeled image descriptions. Experiments on a recent and established benchmark demonstrate that our proposal, particularly the fine-tuned approach, achieves state-of-the-art results outperforming…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSentiment Analysis and Opinion Mining · Multimodal Machine Learning Applications · Emotion and Mood Recognition