RelServe: Fast LLM Inference Serving on Relational Data

Xin Zhang; Shihong Gao; Yanyan Shen; Haoyang Li; Lei Chen

arXiv:2601.11546·cs.DB·January 21, 2026

RelServe: Fast LLM Inference Serving on Relational Data

Xin Zhang, Shihong Gao, Yanyan Shen, Haoyang Li, Lei Chen

PDF

Open Access

TL;DR

RelServe is a novel LLM inference engine designed for relQuery workloads on relational data, significantly reducing latency through dynamic prioritization and adaptive batching, enabling faster responses in AI-powered applications.

Contribution

It introduces a Dynamic Priority Updater and an Adaptive Batch Arranger to optimize LLM inference serving, addressing HoL blocking and latency trade-offs in relQuery workloads.

Findings

01

Up to 3.1x latency reduction compared to vLLM

02

Effective handling of concurrent relQuery workloads

03

Validated on four real-world datasets with large LLMs

Abstract

The use of Large Language Models (LLMs) for querying relational data has given rise to relQuery, a workload pattern that applies templated LLM calls to structured tables. As relQuery services become more widely adopted in applications such as AI-powered spreadsheets, fast response times under concurrent query loads are increasingly important. Unfortunately, current LLM engines face severe latency bottlenecks from Head-of-Line (HoL) blocking across three comparable inference phases: waiting, core running, and tail running. Existing static priority scheduling methods only address HoL blocking during the waiting phase, leaving two critical problems unsolved. First, the absence of a priority update mechanism causes inaccurate prioritization and continued HoL blocking during core execution. Second, suboptimal prefill-decode batching exacerbates HoL blocking in tail execution and worsens…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCloud Computing and Resource Management · Data Quality and Management · Software System Performance and Reliability