Transfer Learning with Joint Fine-Tuning for Multimodal Sentiment Analysis
Guilherme Louren\c{c}o de Toledo, Ricardo Marcondes Marcacini

TL;DR
This paper proposes a transfer learning method with joint fine-tuning for multimodal sentiment analysis, effectively combining pre-trained unimodal models to improve performance and efficiency, especially in low-resource settings.
Contribution
It introduces a flexible, efficient joint fine-tuning approach that leverages pre-trained models for text and images, reducing computational costs compared to existing multimodal models.
Findings
Achieved competitive sentiment analysis results with simpler fine-tuning strategy.
Demonstrated flexibility in incorporating various pre-trained models.
Effective in low-resource sentiment classification scenarios.
Abstract
Most existing methods focus on sentiment analysis of textual data. However, recently there has been a massive use of images and videos on social platforms, motivating sentiment analysis from other modalities. Current studies show that exploring other modalities (e.g., images) increases sentiment analysis performance. State-of-the-art multimodal models, such as CLIP and VisualBERT, are pre-trained on datasets with the text paired with images. Although the results obtained by these models are promising, pre-training and sentiment analysis fine-tuning tasks of these models are computationally expensive. This paper introduces a transfer learning approach using joint fine-tuning for sentiment analysis. Our proposal achieved competitive results using a more straightforward alternative fine-tuning strategy that leverages different pre-trained unimodal models and efficiently combines them in a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSentiment Analysis and Opinion Mining · Topic Modeling · Multimodal Machine Learning Applications
MethodsVisualBERT · Contrastive Language-Image Pre-training
