Multimodal Misinformation Detection by Learning from Synthetic Data with   Multimodal LLMs

Fengzhu Zeng; Wenqian Li; Wei Gao; Yan Pang

arXiv:2409.19656·cs.CL·October 1, 2024

Multimodal Misinformation Detection by Learning from Synthetic Data with Multimodal LLMs

Fengzhu Zeng, Wenqian Li, Wei Gao, Yan Pang

PDF

Open Access

TL;DR

This paper introduces a method for improving multimodal misinformation detection by selecting synthetic data that closely matches real-world data distributions, significantly enhancing the performance of small multimodal LLMs.

Contribution

The authors propose two model-agnostic data selection techniques to bridge the distribution gap between synthetic and real-world data for misinformation detection.

Findings

01

Enhanced detection accuracy on real-world datasets

02

Small MLLMs outperform GPT-4V after data selection

03

Effective synthetic data utilization for misinformation detection

Abstract

Detecting multimodal misinformation, especially in the form of image-text pairs, is crucial. Obtaining large-scale, high-quality real-world fact-checking datasets for training detectors is costly, leading researchers to use synthetic datasets generated by AI technologies. However, the generalizability of detectors trained on synthetic data to real-world scenarios remains unclear due to the distribution gap. To address this, we propose learning from synthetic data for detecting real-world multimodal misinformation through two model-agnostic data selection methods that match synthetic and real-world data distributions. Experiments show that our method enhances the performance of a small MLLM (13B) on real-world fact-checking datasets, enabling it to even surpass GPT-4V~\cite{GPT-4V}.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpam and Phishing Detection · Misinformation and Its Impacts