Multimodal Misinformation Detection using Large Vision-Language Models
Sahar Tahmasebi, Eric M\"uller-Budack, Ralph Ewerth

TL;DR
This paper explores the use of large vision-language models for zero-shot multimodal misinformation detection, integrating evidence retrieval and fact verification to improve accuracy and generalization.
Contribution
It introduces a novel re-ranking method for multimodal evidence retrieval and a multimodal fact verification approach using LVLMs, addressing evidence completeness and zero-shot detection.
Findings
Outperforms baseline in evidence retrieval accuracy
Achieves higher fact verification accuracy on two datasets
Demonstrates better cross-dataset generalization
Abstract
The increasing proliferation of misinformation and its alarming impact have motivated both industry and academia to develop approaches for misinformation detection and fact checking. Recent advances on large language models (LLMs) have shown remarkable performance in various tasks, but whether and how LLMs could help with misinformation detection remains relatively underexplored. Most of existing state-of-the-art approaches either do not consider evidence and solely focus on claim related features or assume the evidence to be provided. Few approaches consider evidence retrieval as part of the misinformation detection but rely on fine-tuning models. In this paper, we investigate the potential of LLMs for misinformation detection in a zero-shot setting. We incorporate an evidence retrieval component into the process as it is crucial to gather pertinent information from various sources to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMisinformation and Its Impacts
MethodsSparse Evolutionary Training · Focus
