HEXGEN-FLOW: Optimizing LLM Inference Request Scheduling for Agentic Text-to-SQL

You Peng; Youhe Jiang; Wenqi Jiang; Chen Wang; Binhang Yuan

arXiv:2505.05286·cs.DB·March 10, 2026

HEXGEN-FLOW: Optimizing LLM Inference Request Scheduling for Agentic Text-to-SQL

You Peng, Youhe Jiang, Wenqi Jiang, Chen Wang, Binhang Yuan

PDF

Open Access 1 Repo 1 Datasets

TL;DR

HEXGEN-FLOW is a novel scheduling framework that optimizes multi-stage LLM inference workflows for Text-to-SQL tasks, significantly reducing latency and increasing throughput in heterogeneous GPU environments.

Contribution

The paper introduces HEXGEN-FLOW, a hierarchical scheduling framework with adaptive tuning for efficient multi-stage LLM-based Text-to-SQL inference on heterogeneous GPU clusters.

Findings

01

Reduces P95 tail latency by 1.42 to 1.56 times

02

Increases throughput by 1.49 to 1.81 times

03

Outperforms existing LLM serving frameworks on realistic benchmarks

Abstract

Recent advances in agentic large language models (LLMs) have substantially improved Text-to-SQL, enabling users without database expertise to query databases intuitively. However, deploying agentic LLM-based Text-to-SQL systems in production remains challenging due to multi-stage dependencies, strict latency requirements, and deployment complexity across heterogeneous GPUs in enterprise clusters. Existing LLM serving frameworks are designed mainly for independent inference tasks, leading to suboptimal performance and frequent service-level objective (SLO) violations for Text-to-SQL workloads. In this paper, we introduce \sys, a framework for scheduling and executing agentic multi-stage LLM-based Text-to-SQL workflows on heterogeneous GPU clusters serving multi-tenant requests. \sys adopts a hierarchical scheduler that combines global workload-balanced task dispatching with an adaptive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

relaxed-system-lab/hexgen-flow
noneOfficial

Datasets

fredpeng/Text2SQL_Workflow_Trace
dataset· 20 dl
20 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsScientific Computing and Data Management · Natural Language Processing Techniques · Big Data and Digital Economy