ViDoRe V3: A Comprehensive Evaluation of Retrieval Augmented Generation in Complex Real-World Scenarios

Ant\'onio Loison; Quentin Mac\'e; Antoine Edy; Victor Xing; Tom Balough; Gabriel Moreira; Bo Liu; Manuel Faysse; C\'eline Hudelot; Gautier Viaud

arXiv:2601.08620·cs.AI·April 22, 2026

ViDoRe V3: A Comprehensive Evaluation of Retrieval Augmented Generation in Complex Real-World Scenarios

Ant\'onio Loison, Quentin Mac\'e, Antoine Edy, Victor Xing, Tom Balough, Gabriel Moreira, Bo Liu, Manuel Faysse, C\'eline Hudelot, Gautier Viaud

PDF

1 Repo 19 Datasets

TL;DR

ViDoRe v3 introduces a comprehensive multimodal benchmark for retrieval-augmented generation, emphasizing complex real-world scenarios involving visual elements, multi-document synthesis, and multilingual queries, to evaluate and improve state-of-the-art models.

Contribution

The paper presents ViDoRe v3, a new benchmark with diverse datasets, extensive annotations, and evaluation protocols for multimodal RAG in complex, real-world contexts.

Findings

01

Visual retrievers outperform textual ones.

02

Late-interaction models and reranking improve performance.

03

Hybrid and visual-only contexts enhance answer quality.

Abstract

Retrieval-Augmented Generation (RAG) pipelines must address challenges beyond simple single-document retrieval, such as interpreting visual elements (tables, charts, images), synthesizing information across documents, and providing accurate source grounding. Existing benchmarks fail to capture this complexity, often focusing on textual data, single-document comprehension, or evaluating retrieval and generation in isolation. We introduce ViDoRe v3, a comprehensive multimodal RAG benchmark featuring multi-type queries over visually rich document corpora. It covers 10 datasets across diverse professional domains, comprising ~26,000 document pages paired with 3,099 human-verified queries, each available in 6 languages. Through 12,000 hours of human annotation effort, we provide high-quality annotations for retrieval relevance, bounding box localization, and verified reference answers. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://hf.co/vidore
github

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.