Zero-Shot Warning Generation for Misinformative Multimodal Content
Giovanni Pio Delvecchio, Huy Hong Nguyen, Isao Echizen

TL;DR
This paper introduces a zero-shot model for detecting and explaining out-of-context multimodal misinformation, using minimal training and a lightweight architecture to generate contextualized warnings for debunking.
Contribution
It presents a novel zero-shot detection method for multimodal misinformation and a lightweight model capable of generating explanatory warnings with minimal training.
Findings
Competitive performance with fewer parameters
Effective zero-shot warning generation
Qualitative and human evaluation results
Abstract
The widespread prevalence of misinformation poses significant societal concerns. Out-of-context misinformation, where authentic images are paired with false text, is particularly deceptive and easily misleads audiences. Most existing detection methods primarily evaluate image-text consistency but often lack sufficient explanations, which are essential for effectively debunking misinformation. We present a model that detects multimodal misinformation through cross-modality consistency checks, requiring minimal training time. Additionally, we propose a lightweight model that achieves competitive performance using only one-third of the parameters. We also introduce a dual-purpose zero-shot learning task for generating contextualized warnings, enabling automated debunking and enhancing user comprehension. Qualitative and human evaluations of the generated warnings highlight both the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMisinformation and Its Impacts · Topic Modeling · Advanced Text Analysis Techniques
