ViDoRe Benchmark V2: Raising the Bar for Visual Retrieval
Quentin Mac\'e, Ant\'onio Loison, Manuel Faysse

TL;DR
ViDoRe Benchmark V2 advances visual retrieval evaluation by introducing challenging, multilingual, and realistic scenarios, encouraging ongoing community-driven improvements in model performance.
Contribution
It presents a new benchmark with diverse datasets and realistic queries, addressing saturation issues in previous versions and fostering progress in visual retrieval models.
Findings
Significant room for improvement in current models.
Insights into model generalization and multilingual capabilities.
Benchmark's design promotes continuous community engagement.
Abstract
The ViDoRe Benchmark V1 was approaching saturation with top models exceeding 90% nDCG@5, limiting its ability to discern improvements. ViDoRe Benchmark V2 introduces realistic, challenging retrieval scenarios via blind contextual querying, long and cross-document queries, and a hybrid synthetic and human-in-the-loop query generation process. It comprises four diverse, multilingual datasets and provides clear evaluation instructions. Initial results demonstrate substantial room for advancement and highlight insights on model generalization and multilingual capability. This benchmark is designed as a living resource, inviting community contributions to maintain relevance through future evaluations.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Image Retrieval and Classification Techniques
