Trove: A Flexible Toolkit for Dense Retrieval
Reza Esfandiarpoor, Max Zuo, Stephen H. Bach

TL;DR
Trove is an open-source toolkit that streamlines dense retrieval research by offering flexible, efficient data management, easy customization, and scalable evaluation pipelines, significantly reducing memory use and simplifying experimentation.
Contribution
Trove introduces a flexible, low-code toolkit with efficient data handling and customizable components, enabling easier and more scalable dense retrieval research.
Findings
Data management reduces memory by 2.6x.
Inference times decrease linearly with nodes.
Toolkit simplifies and accelerates retrieval experiments.
Abstract
We introduce Trove, an easy-to-use open-source retrieval toolkit that simplifies research experiments without sacrificing flexibility or speed. For the first time, we introduce efficient data management features that load and process (filter, select, transform, and combine) retrieval datasets on the fly, with just a few lines of code. This gives users the flexibility to easily experiment with different dataset configurations without the need to compute and store multiple copies of large datasets. Trove is highly customizable: in addition to many built-in options, it allows users to freely modify existing components or replace them entirely with user-defined objects. It also provides a low-code and unified pipeline for evaluation and hard negative mining, which supports multi-node execution without any code changes. Trove's data management features reduce memory consumption by a factor…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsScientific Computing and Data Management · Biomedical Text Mining and Ontologies · Research Data Management Practices
