ViDoRe Benchmark V2: Raising the Bar for Visual Retrieval

Quentin Mac\'e; Ant\'onio Loison; Manuel Faysse

arXiv:2505.17166·cs.IR·September 22, 2025

ViDoRe Benchmark V2: Raising the Bar for Visual Retrieval

Quentin Mac\'e, Ant\'onio Loison, Manuel Faysse

PDF

Open Access 2 Repos 1 Models 5 Datasets

TL;DR

ViDoRe Benchmark V2 advances visual retrieval evaluation by introducing challenging, multilingual, and realistic scenarios, encouraging ongoing community-driven improvements in model performance.

Contribution

It presents a new benchmark with diverse datasets and realistic queries, addressing saturation issues in previous versions and fostering progress in visual retrieval models.

Findings

01

Significant room for improvement in current models.

02

Insights into model generalization and multilingual capabilities.

03

Benchmark's design promotes continuous community engagement.

Abstract

The ViDoRe Benchmark V1 was approaching saturation with top models exceeding 90% nDCG@5, limiting its ability to discern improvements. ViDoRe Benchmark V2 introduces realistic, challenging retrieval scenarios via blind contextual querying, long and cross-document queries, and a hybrid synthetic and human-in-the-loop query generation process. It comprises four diverse, multilingual datasets and provides clear evaluation instructions. Initial results demonstrate substantial room for advancement and highlight insights on model generalization and multilingual capability. This benchmark is designed as a living resource, inviting community contributions to maintain relevance through future evaluations.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
ibm-granite/granite-vision-3.3-2b-embedding
model· 269 dl· ♡ 26
269 dl♡ 26

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Image Retrieval and Classification Techniques