Granite Embedding Models

Parul Awasthy; Aashka Trivedi; Yulong Li; Mihaela Bornea; David Cox,; Abraham Daniels; Martin Franz; Gabe Goodhart; Bhavani Iyer; Vishwajeet Kumar,; Luis Lastras; Scott McCarley; Rudra Murthy; Vignesh P; Sara Rosenthal; Salim; Roukos; Jaydeep Sen; Sukriti Sharma; Avirup Sil; Kate Soule; Arafat Sultan,; Radu Florian

arXiv:2502.20204·cs.IR·February 28, 2025

Granite Embedding Models

Parul Awasthy, Aashka Trivedi, Yulong Li, Mihaela Bornea, David Cox,, Abraham Daniels, Martin Franz, Gabe Goodhart, Bhavani Iyer, Vishwajeet Kumar,, Luis Lastras, Scott McCarley, Rudra Murthy, Vignesh P, Sara Rosenthal, Salim, Roukos, Jaydeep Sen, Sukriti Sharma, Avirup Sil

PDF

Open Access 10 Models

TL;DR

The paper presents the Granite Embedding models, a family of encoder-based retrieval models with multilingual capabilities, that outperform similar-sized models on internal and benchmark retrieval tasks through advanced training techniques.

Contribution

Introduction of highly effective, multilingual encoder-based embedding models with efficient distilled versions, trained with retrieval-specific techniques for improved performance.

Findings

01

Models outperform publicly available counterparts on internal retrieval tasks.

02

Models achieve comparable performance to existing benchmarks.

03

Models are publicly available under open license for research and commercial use.

Abstract

We introduce the Granite Embedding models, a family of encoder-based embedding models designed for retrieval tasks, spanning dense-retrieval and sparse retrieval architectures, with both English and Multilingual capabilities. This report provides the technical details of training these highly effective 12 layer embedding models, along with their efficient 6 layer distilled counterparts. Extensive evaluations show that the models, developed with techniques like retrieval oriented pretraining, contrastive finetuning, knowledge distillation, and model merging significantly outperform publicly available models of similar sizes on both internal IBM retrieval and search tasks, and have equivalent performance on widely used information retrieval benchmarks, while being trained on high-quality data suitable for enterprise use. We publicly release all our Granite Embedding models under the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInformation Retrieval and Search Behavior · Topic Modeling · Advanced Graph Neural Networks