AVerImaTeC: A Dataset for Automatic Verification of Image-Text Claims with Evidence from the Web
Rui Cao, Zifeng Ding, Zhijiang Guo, Michael Schlichtkrull, Andreas Vlachos

TL;DR
AVerImaTeC is a new dataset with 1,297 real-world image-text claims, annotated with web evidence and reasoning, designed to improve automated verification of claims and address common fact-checking challenges.
Contribution
The paper introduces AVerImaTeC, a dataset with evidence annotations and a novel evaluation method for verifying image-text claims using open-web evidence.
Findings
High inter-annotator agreement on verdicts (κ=0.742)
74.7% consistency in QA pair annotations
Established baseline models for evidence retrieval and claim verification
Abstract
Textual claims are often accompanied by images to enhance their credibility and spread on social media, but this also raises concerns about the spread of misinformation. Existing datasets for automated verification of image-text claims remain limited, as they often consist of synthetic claims and lack evidence annotations to capture the reasoning behind the verdict. In this work, we introduce AVerImaTeC, a dataset consisting of 1,297 real-world image-text claims. Each claim is annotated with question-answer (QA) pairs containing evidence from the web, reflecting a decomposed reasoning regarding the verdict. We mitigate common challenges in fact-checking datasets such as contextual dependence, temporal leakage, and evidence insufficiency, via claim normalization, temporally constrained evidence annotation, and a two-stage sufficiency check. We assess the consistency of the annotation in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
