VLDBench Evaluating Multimodal Disinformation with Regulatory Alignment

Shaina Raza; Ashmal Vayani; Aditya Jain; Aravind Narayanan; Vahid Reza Khazaie; Syed Raza Bashir; Elham Dolatabadi; Gias Uddin; Christos Emmanouilidis; Rizwan Qureshi; Mubarak Shah

arXiv:2502.11361·cs.CL·December 23, 2025

VLDBench Evaluating Multimodal Disinformation with Regulatory Alignment

Shaina Raza, Ashmal Vayani, Aditya Jain, Aravind Narayanan, Vahid Reza Khazaie, Syed Raza Bashir, Elham Dolatabadi, Gias Uddin, Christos Emmanouilidis, Rizwan Qureshi, Mubarak Shah

PDF

Open Access 1 Repo 1 Datasets

TL;DR

VLDBench is a comprehensive benchmark dataset designed to evaluate and improve the detection of multimodal disinformation involving both text and images, addressing a critical gap in AI safety research.

Contribution

It introduces the first large-scale, high-quality dataset for multimodal disinformation detection, supporting both unimodal and multimodal analysis with expert annotations.

Findings

01

Visual cues significantly improve detection accuracy by 5-35%

02

State-of-the-art models perform better with multimodal data

03

VLDBench enables evaluation, fine-tuning, and robustness testing

Abstract

Detecting disinformation that blends manipulated text and images has become increasingly challenging, as AI tools make synthetic content easy to generate and disseminate. While most existing AI safety benchmarks focus on single modality misinformation (i.e., false content shared without intent to deceive), intentional multimodal disinformation, such as propaganda or conspiracy theories that imitate credible news, remains largely unaddressed. We introduce the Vision-Language Disinformation Detection Benchmark (VLDBench), the first large-scale resource supporting both unimodal (text-only) and multimodal (text + image) disinformation detection. VLDBench comprises approximately 62,000 labeled text-image pairs across 13 categories, curated from 58 news outlets. Using a semi-automated pipeline followed by expert review, 22 domain experts invested over 500 hours to produce high-quality…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

VectorInstitute/VLDBench
pytorch

Datasets

vector-institute/VLDBench
dataset· 28 dl
28 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMisinformation and Its Impacts