DS SERVE: A Framework for Efficient and Scalable Neural Retrieval

Jinjian Liu; Yichuan Wang; Xinxi Lyu; Rulin Shao; Joseph E. Gonzalez; Matei Zaharia; Sewon Min

arXiv:2602.22224·cs.IR·February 27, 2026

DS SERVE: A Framework for Efficient and Scalable Neural Retrieval

Jinjian Liu, Yichuan Wang, Xinxi Lyu, Rulin Shao, Joseph E. Gonzalez, Matei Zaharia, Sewon Min

PDF

Open Access 1 Video

TL;DR

DS-Serve is a scalable neural retrieval framework capable of handling massive text datasets with low latency and flexible trade-offs, supporting various applications like RAG and data attribution.

Contribution

It introduces a novel framework that efficiently transforms large-scale datasets into high-performance neural retrieval systems with flexible inference options.

Findings

01

Supports half a trillion tokens dataset processing

02

Achieves low latency and modest memory usage on a single node

03

Enables flexible trade-offs between latency, accuracy, and diversity

Abstract

We present DS-Serve, a framework that transforms large-scale text datasets, comprising half a trillion tokens, into a high-performance neural retrieval system. DS-Serve offers both a web interface and API endpoints, achieving low latency with modest memory overhead on a single node. The framework also supports inference-time trade-offs between latency, accuracy, and result diversity. We anticipate that DS-Serve will be broadly useful for a range of applications, including large-scale retrieval-augmented generation (RAG), training data attribution, training search agents, and beyond.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

DS SERVE: A Framework for Efficient and Scalable Neural Retrieval· underline

Taxonomy

TopicsTopic Modeling · Information Retrieval and Search Behavior · Multimodal Machine Learning Applications