Trove: A Flexible Toolkit for Dense Retrieval

Reza Esfandiarpoor; Max Zuo; Stephen H. Bach

arXiv:2511.01857·cs.IR·November 4, 2025

Trove: A Flexible Toolkit for Dense Retrieval

Reza Esfandiarpoor, Max Zuo, Stephen H. Bach

PDF

Open Access 1 Video

TL;DR

Trove is an open-source toolkit that streamlines dense retrieval research by offering flexible, efficient data management, easy customization, and scalable evaluation pipelines, significantly reducing memory use and simplifying experimentation.

Contribution

Trove introduces a flexible, low-code toolkit with efficient data handling and customizable components, enabling easier and more scalable dense retrieval research.

Findings

01

Data management reduces memory by 2.6x.

02

Inference times decrease linearly with nodes.

03

Toolkit simplifies and accelerates retrieval experiments.

Abstract

We introduce Trove, an easy-to-use open-source retrieval toolkit that simplifies research experiments without sacrificing flexibility or speed. For the first time, we introduce efficient data management features that load and process (filter, select, transform, and combine) retrieval datasets on the fly, with just a few lines of code. This gives users the flexibility to easily experiment with different dataset configurations without the need to compute and store multiple copies of large datasets. Trove is highly customizable: in addition to many built-in options, it allows users to freely modify existing components or replace them entirely with user-defined objects. It also provides a low-code and unified pipeline for evaluation and hard negative mining, which supports multi-node execution without any code changes. Trove's data management features reduce memory consumption by a factor…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Trove: A Flexible Toolkit for Dense Retrieval· underline

Taxonomy

TopicsScientific Computing and Data Management · Biomedical Text Mining and Ontologies · Research Data Management Practices