MIRACL-VISION: A Large, multilingual, visual document retrieval benchmark

Radek Osmulski; Gabriel de Souza P. Moreira; Ronay Ak; Mengyao Xu; Benedikt Schifferer; Even Oldridge

arXiv:2505.11651·cs.IR·May 22, 2025

MIRACL-VISION: A Large, multilingual, visual document retrieval benchmark

Radek Osmulski, Gabriel de Souza P. Moreira, Ronay Ak, Mengyao Xu, Benedikt Schifferer, Even Oldridge

PDF

Open Access 1 Datasets

TL;DR

MIRACL-VISION is a comprehensive multilingual benchmark designed to evaluate visual document retrieval models across 18 languages, addressing limitations of existing benchmarks and highlighting the performance gap between visual and text-based retrieval methods.

Contribution

It introduces MIRACL-VISION, a new multilingual visual document retrieval benchmark based on the MIRACL dataset, with a novel method to filter easy negatives for more challenging evaluation.

Findings

01

Visual models perform up to 59.7% worse than text models in multilingual retrieval.

02

Even in English, visual models lag behind text-based models by 12.1%.

03

MIRACL-VISION provides a challenging benchmark for developing robust visual retrieval models.

Abstract

Document retrieval is an important task for search and Retrieval-Augmented Generation (RAG) applications. Large Language Models (LLMs) have contributed to improving the accuracy of text-based document retrieval. However, documents with complex layout and visual elements like tables, charts and infographics are not perfectly represented in textual format. Recently, image-based document retrieval pipelines have become popular, which use visual large language models (VLMs) to retrieve relevant page images given a query. Current evaluation benchmarks on visual document retrieval are limited, as they primarily focus only English language, rely on synthetically generated questions and offer a small corpus size. Therefore, we introduce MIRACL-VISION, a multilingual visual document retrieval evaluation benchmark. MIRACL-VISION covers 18 languages, and is an extension of the MIRACL dataset, a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

nvidia/miracl-vision
dataset· 868 dl
868 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Image Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques

MethodsFocus