VILLAIN at AVerImaTeC: Verifying Image-Text Claims via Multi-Agent Collaboration

Jaeyoon Jung; Yejun Yoon; and Kunwoo Park

arXiv:2602.04587·cs.CL·February 23, 2026

VILLAIN at AVerImaTeC: Verifying Image-Text Claims via Multi-Agent Collaboration

Jaeyoon Jung, Yejun Yoon, and Kunwoo Park

PDF

Open Access 1 Datasets 1 Video

TL;DR

VILLAIN is a multimodal fact-checking system that uses multi-agent collaboration to verify image-text claims, combining evidence retrieval, analysis, and verdict prediction, achieving top performance in the AVerImaTeC shared task.

Contribution

The paper introduces VILLAIN, a novel multi-agent framework for multimodal fact-checking that integrates evidence retrieval, analysis, and verification, setting new state-of-the-art results.

Findings

01

Ranked first on the AVerImaTeC leaderboard

02

Effective multi-agent collaboration improves verification accuracy

03

System code is publicly available for reproducibility

Abstract

This paper describes VILLAIN, a multimodal fact-checking system that verifies image-text claims through prompt-based multi-agent collaboration. For the AVerImaTeC shared task, VILLAIN employs vision-language model agents across multiple stages of fact-checking. Textual and visual evidence is retrieved from the knowledge store enriched through additional web collection. To identify key information and address inconsistencies among evidence items, modality-specific and cross-modal agents generate analysis reports. In the subsequent stage, question-answer pairs are produced based on these reports. Finally, the Verdict Prediction agent produces the verification outcome based on the image-text claim and the generated question-answer pairs. Our system ranked first on the leaderboard across all evaluation metrics. The source code is publicly available at https://github.com/ssu-humane/VILLAIN.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

humane-lab/AVerImaTeC-Filled
dataset· 16 dl
16 dl

Videos

VILLAIN at AVerImaTeC: Verifying Image–Text Claims via Multi-Agent Collaboration· underline

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Explainable Artificial Intelligence (XAI)