Leveraging Textual-Cues for Enhancing Multimodal Sentiment Analysis by Object Recognition
Sumana Biswas, Karen Young, Josephine Griffith

TL;DR
This paper introduces TEMSA, a novel approach that combines object recognition with textual cues to improve multimodal sentiment analysis by integrating image and text data.
Contribution
The work presents a new method, TEMSA, which leverages object detection to enhance sentiment analysis in multimodal data, outperforming individual modality analysis.
Findings
TEMS improves overall sentiment prediction accuracy.
Object names combined with text enhance multimodal sentiment analysis.
TEMS outperforms baseline methods on two datasets.
Abstract
Multimodal sentiment analysis, which includes both image and text data, presents several challenges due to the dissimilarities in the modalities of text and image, the ambiguity of sentiment, and the complexities of contextual meaning. In this work, we experiment with finding the sentiments of image and text data, individually and in combination, on two datasets. Part of the approach introduces the novel `Textual-Cues for Enhancing Multimodal Sentiment Analysis' (TEMSA) based on object recognition methods to address the difficulties in multimodal sentiment analysis. Specifically, we extract the names of all objects detected in an image and combine them with associated text; we call this combination of text and image data TEMS. Our results demonstrate that only TEMS improves the results when considering all the object names for the overall sentiment of multimodal data compared to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSentiment Analysis and Opinion Mining · Emotion and Mood Recognition · Multimodal Machine Learning Applications
