Granite Embedding Multilingual R2 Models
Parul Awasthy, Aashka Trivedi, Yushu Yang, Ken Barker, Yulong Li, Bhavani Iyer, Martin Franz, Juergen Bross, Meet Doshi, Vignesh P, Vishwajeet Kumar, Todd Ward, Abraham Daniels, Madison Lee, Luis Lastras, Jaydeep Sen, Radu Florian

TL;DR
The paper presents the Granite Embedding R2 models, advanced multilingual encoder-based embeddings for dense retrieval across 200+ languages, with state-of-the-art performance and enterprise-focused design.
Contribution
Introduction of two multilingual encoder-based embedding models with expanded language support, larger context window, and improved retrieval performance, including a compact high-performing variant.
Findings
Achieved state-of-the-art performance in multilingual and cross-lingual retrieval tasks.
Developed a compact model with the highest retrieval score under 100M parameters.
Supported enterprise use with governance and open licensing.
Abstract
We introduce the multilingual Granite Embedding R2 models, a family of encoder-based embedding models for enterprise-scale dense retrieval across 200+ languages. Extending our English-focused R2 release, these models add enhanced support for 52 languages and programming code, a 32,768-token context window (a 64x expansion over R1), and state-of-the-art overall performance across multilingual and cross-lingual text search, code retrieval, long-document search, and reasoning retrieval datasets. The release consists of two bi-encoder models based on the ModernBERT architecture with an expanded multilingual vocabulary: a 311M-parameter full-size, and a 97M-parameter compact model built via model pruning and vocabulary selection that achieves the highest retrieval score of any open multilingual embedding model under 100M parameters. The full-size also supports Matryoshka Representation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
