Holistic Visual-Textual Sentiment Analysis with Prior Models
Junyu Chen, Jie An, Hanjia Lyu, Christopher Kanan, Jiebo Luo

TL;DR
This paper introduces a comprehensive multimodal sentiment analysis approach that leverages pre-trained models and multiple feature extraction branches to improve accuracy in visual-textual sentiment prediction.
Contribution
It presents a novel holistic framework combining multiple pre-trained and specialized models for enhanced visual-textual sentiment analysis.
Findings
Outperforms existing methods on three benchmark datasets.
Effectively captures semantic visual features with expert encoders.
Improves sentiment prediction accuracy through multimodal feature fusion.
Abstract
Visual-textual sentiment analysis aims to predict sentiment with the input of a pair of image and text, which poses a challenge in learning effective features for diverse input images. To address this, we propose a holistic method that achieves robust visual-textual sentiment analysis by exploiting a rich set of powerful pre-trained visual and textual prior models. The proposed method consists of four parts: (1) a visual-textual branch to learn features directly from data for sentiment analysis, (2) a visual expert branch with a set of pre-trained "expert" encoders to extract selected semantic visual features, (3) a CLIP branch to implicitly model visual-textual correspondence, and (4) a multimodal feature fusion network based on BERT to fuse multimodal features and make sentiment predictions. Extensive experiments on three datasets show that our method produces better visual-textual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSentiment Analysis and Opinion Mining · Multimodal Machine Learning Applications · Visual Attention and Saliency Detection
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dense Connections · Residual Connection · Attention Dropout · Refunds@Expedia|||How do I get a full refund from Expedia? · Softmax · Dropout · Contrastive Language-Image Pre-training
