Improving Multimodal Classification of Social Media Posts by Leveraging Image-Text Auxiliary Tasks
Danae S\'anchez Villegas, Daniel Preo\c{t}iuc-Pietro, Nikolaos Aletras

TL;DR
This paper explores the use of auxiliary tasks, specifically Image-Text Contrastive and Image-Text Matching, to improve the performance of multimodal social media post classification models by better capturing cross-modal semantics.
Contribution
It introduces a joint training approach with two auxiliary losses to enhance multimodal understanding in social media post classification, addressing weak cross-modal relations.
Findings
Consistent F1 improvements up to 2.6 points across datasets
Auxiliary tasks effectively bridge semantic gaps between image and text
Specific auxiliary tasks excel in different scenarios
Abstract
Effectively leveraging multimodal information from social media posts is essential to various downstream tasks such as sentiment analysis, sarcasm detection or hate speech classification. Jointly modeling text and images is challenging because cross-modal semantics might be hidden or the relation between image and text is weak. However, prior work on multimodal classification of social media posts has not yet addressed these challenges. In this work, we present an extensive study on the effectiveness of using two auxiliary losses jointly with the main task during fine-tuning multimodal models. First, Image-Text Contrastive (ITC) is designed to minimize the distance between image-text representations within a post, thereby effectively bridging the gap between posts where the image plays an important role in conveying the post's meaning. Second, Image-Text Matching (ITM) enhances the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSentiment Analysis and Opinion Mining · Topic Modeling · Text and Document Classification Technologies
