Argumentative Stance Prediction: An Exploratory Study on Multimodality and Few-Shot Learning
Arushi Sharma, Abhibha Gupta, Maneesh Bilalpur

TL;DR
This study evaluates the role of images and large-language models in stance prediction on social issues, finding fine-tuned text models outperform multimodal and few-shot LLM approaches, with multimodal models benefiting from natural language summaries of images.
Contribution
It compares unimodal, multimodal, and large-language models for stance prediction, highlighting the effectiveness of fine-tuned text models and the benefits of image summarization.
Findings
Fine-tuned text models achieve 0.817 F1-score.
Multimodal models achieve 0.677 F1-score.
Few-shot LLMs achieve 0.550 F1-score.
Abstract
To advance argumentative stance prediction as a multimodal problem, the First Shared Task in Multimodal Argument Mining hosted stance prediction in crucial social topics of gun control and abortion. Our exploratory study attempts to evaluate the necessity of images for stance prediction in tweets and compare out-of-the-box text-based large-language models (LLM) in few-shot settings against fine-tuned unimodal and multimodal models. Our work suggests an ensemble of fine-tuned text-based language models (0.817 F1-score) outperforms both the multimodal (0.677 F1-score) and text-based few-shot prediction using a recent state-of-the-art LLM (0.550 F1-score). In addition to the differences in performance, our findings suggest that the multimodal models tend to perform better when image content is summarized as natural language over their native pixel structure and, using in-context examples…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Sentiment Analysis and Opinion Mining · Hate Speech and Cyberbullying Detection
