Tevatron 2.0: Unified Document Retrieval Toolkit across Scale, Language, and Modality
Xueguang Ma, Luyu Gao, Shengyao Zhuang, Jiaqi Samantha Zhan, Jamie, Callan, Jimmy Lin

TL;DR
The paper introduces Tevatron 2.0, a comprehensive toolkit for scalable, multilingual, and multimodal document retrieval, featuring a unified dense retriever and the OmniEmbed embedding model for diverse data types.
Contribution
It presents a unified retrieval pipeline supporting multiple scales, languages, and modalities, along with the novel OmniEmbed model unifying various data modalities for retrieval tasks.
Findings
Achieves strong multilingual and multimodal retrieval performance.
Demonstrates cross-modality zero-shot retrieval capabilities.
Provides a versatile toolkit bridging academia and industry.
Abstract
Recent advancements in large language models (LLMs) have driven interest in billion-scale retrieval models with strong generalization across retrieval tasks and languages. Additionally, progress in large vision-language models has created new opportunities for multimodal retrieval. In response, we have updated the Tevatron toolkit, introducing a unified pipeline that enables researchers to explore retriever models at different scales, across multiple languages, and with various modalities. This demo paper highlights the toolkit's key features, bridging academia and industry by supporting efficient training, inference, and evaluation of neural retrievers. We showcase a unified dense retriever achieving strong multilingual and multimodal effectiveness, and conduct a cross-modality zero-shot study to demonstrate its research potential. Alongside, we release OmniEmbed, to the best of our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies
