Multimodal LLMs See Sentiment
Neemias B. da Silva, John Harrison, Rodrigo Minetto, Myriam R. Delgado, Bogdan T. Nassu, Thiago H. Silva

TL;DR
This paper introduces MLLMsent, a framework for evaluating multimodal large language models' ability to understand sentiment from images, achieving state-of-the-art results through fine-tuning and cross-dataset testing.
Contribution
The paper presents a novel framework for sentiment reasoning in multimodal models, including direct classification, description-based analysis, and fine-tuning, with significant performance improvements.
Findings
State-of-the-art sentiment classification accuracy
Up to 30.9% improvement over baselines
Effective cross-dataset generalization without retraining
Abstract
Understanding how visual content communicates sentiment is critical in an era where online interaction is increasingly dominated by this kind of media on social platforms. However, this remains a challenging problem, as sentiment perception is closely tied to complex, scene-level semantics. In this paper, we propose an original framework, MLLMsent, to investigate the sentiment reasoning capabilities of Multimodal Large Language Models (MLLMs) through three perspectives: (1) using those MLLMs for direct sentiment classification from images; (2) associating them with pre-trained LLMs for sentiment analysis on automatically generated image descriptions; and (3) fine-tuning the LLMs on sentiment-labeled image descriptions. Experiments on a recent and established benchmark demonstrate that our proposal, particularly the fine-tuned approach, achieves state-of-the-art results outperforming…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSentiment Analysis and Opinion Mining · Multimodal Machine Learning Applications · Emotion and Mood Recognition
