TL;DR
This paper provides a comprehensive, fair, and reproducible comparison of six state-of-the-art multimodal tweet sentiment analysis methods, exploring various feature embeddings and evaluating on benchmark datasets.
Contribution
It introduces a reproducible evaluation scheme, compares multiple methods including CLIP embeddings, and analyzes their limitations for future improvements.
Findings
CLIP embeddings improve multimodal sentiment analysis accuracy
Reproducible evaluation scheme enhances comparability of results
Error analysis reveals key limitations and future directions
Abstract
Opinion and sentiment analysis is a vital task to characterize subjective information in social media posts. In this paper, we present a comprehensive experimental evaluation and comparison with six state-of-the-art methods, from which we have re-implemented one of them. In addition, we investigate different textual and visual feature embeddings that cover different aspects of the content, as well as the recently introduced multimodal CLIP embeddings. Experimental results are presented for two different publicly available benchmark datasets of tweets and corresponding images. In contrast to the evaluation methodology of previous work, we introduce a reproducible and fair evaluation scheme to make results comparable. Finally, we conduct an error analysis to outline the limitations of the methods and possibilities for the future work.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsContrastive Language-Image Pre-training
