M3DR: Towards Universal Multilingual Multimodal Document Retrieval
Adithya S Kolavi, Vyoman Jain

TL;DR
M3DR introduces a multilingual multimodal document retrieval framework that leverages synthetic data and contrastive training to achieve state-of-the-art cross-lingual and cross-modal retrieval across diverse languages and scripts.
Contribution
The paper presents M3DR, a novel framework that generalizes across languages, architectures, and retrieval paradigms, significantly improving multilingual multimodal document retrieval performance.
Findings
Achieves ~150% relative improvements on cross-lingual retrieval tasks.
Demonstrates consistent performance across 22 typologically diverse languages.
Introduces a comprehensive multilingual retrieval benchmark.
Abstract
Multimodal document retrieval systems have shown strong progress in aligning visual and textual content for semantic search. However, most existing approaches remain heavily English-centric, limiting their effectiveness in multilingual contexts. In this work, we present M3DR (Multilingual Multimodal Document Retrieval), a framework designed to bridge this gap across languages, enabling applicability across diverse linguistic and cultural contexts. M3DR leverages synthetic multilingual document data and generalizes across different vision-language architectures and model sizes, enabling robust cross-lingual and cross-modal alignment. Using contrastive training, our models learn unified representations for text and document images that transfer effectively across languages. We validate this capability on 22 typologically diverse languages, demonstrating consistent performance and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques
