Nemotron ColEmbed V2: Top-Performing Late Interaction Embedding Models for Visual Document Retrieval
Gabriel de Souza P. Moreira, Ronay Ak, Mengyao Xu, Oliver Holworthy, Benedikt Schifferer, Zhiding Yu, Yauhen Babakhin, Radek Osmulski, Jiarui Cai, Ryan Chesler, Bo Liu, Even Oldridge

TL;DR
Nemotron ColEmbed V2 introduces top-performing visual document retrieval models leveraging late interaction embeddings, achieving state-of-the-art results on ViDoRe benchmarks with various parameter sizes.
Contribution
The paper presents Nemotron ColEmbed V2, a family of models based on pre-trained VLMs that set new performance standards for visual document retrieval.
Findings
8B model ranks first on ViDoRe V3 leaderboard
Achieves an average NDCG@10 of 63.42
Techniques like cluster-based sampling and hard-negative mining improve performance
Abstract
Retrieval-Augmented Generation (RAG) systems have been popular for generative applications, powering language models by injecting external knowledge. Companies have been trying to leverage their large catalog of documents (e.g. PDFs, presentation slides) in such RAG pipelines, whose first step is the retrieval component. Dense retrieval has been a popular approach, where embedding models are used to generate a dense representation of the user query that is closer to relevant content embeddings. More recently, VLM-based embedding models have become popular for visual document retrieval, as they preserve visual information and simplify the indexing pipeline compared to OCR text extraction. Motivated by the growing demand for visual document retrieval, we introduce Nemotron ColEmbed V2, a family of models that achieve state-of-the-art performance on the ViDoRe benchmarks. We release…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
