AVerImaTeC: A Dataset for Automatic Verification of Image-Text Claims with Evidence from the Web

Rui Cao; Zifeng Ding; Zhijiang Guo; Michael Schlichtkrull; Andreas Vlachos

arXiv:2505.17978·cs.CL·October 8, 2025

AVerImaTeC: A Dataset for Automatic Verification of Image-Text Claims with Evidence from the Web

Rui Cao, Zifeng Ding, Zhijiang Guo, Michael Schlichtkrull, Andreas Vlachos

PDF

1 Datasets 1 Video

TL;DR

AVerImaTeC is a new dataset with 1,297 real-world image-text claims, annotated with web evidence and reasoning, designed to improve automated verification of claims and address common fact-checking challenges.

Contribution

The paper introduces AVerImaTeC, a dataset with evidence annotations and a novel evaluation method for verifying image-text claims using open-web evidence.

Findings

01

High inter-annotator agreement on verdicts (κ=0.742)

02

74.7% consistency in QA pair annotations

03

Established baseline models for evidence retrieval and claim verification

Abstract

Textual claims are often accompanied by images to enhance their credibility and spread on social media, but this also raises concerns about the spread of misinformation. Existing datasets for automated verification of image-text claims remain limited, as they often consist of synthetic claims and lack evidence annotations to capture the reasoning behind the verdict. In this work, we introduce AVerImaTeC, a dataset consisting of 1,297 real-world image-text claims. Each claim is annotated with question-answer (QA) pairs containing evidence from the web, reflecting a decomposed reasoning regarding the verdict. We mitigate common challenges in fact-checking datasets such as contextual dependence, temporal leakage, and evidence insufficiency, via claim normalization, temporally constrained evidence annotation, and a two-stage sufficiency check. We assess the consistency of the annotation in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Rui4416/AVerImaTeC
dataset· 114 dl
114 dl

Videos

AVerImaTeC: A Dataset for Automatic Verification of Image-Text Claims with Evidence from the Web· slideslive