HEXGEN-FLOW: Optimizing LLM Inference Request Scheduling for Agentic Text-to-SQL
You Peng, Youhe Jiang, Wenqi Jiang, Chen Wang, Binhang Yuan

TL;DR
HEXGEN-FLOW is a novel scheduling framework that optimizes multi-stage LLM inference workflows for Text-to-SQL tasks, significantly reducing latency and increasing throughput in heterogeneous GPU environments.
Contribution
The paper introduces HEXGEN-FLOW, a hierarchical scheduling framework with adaptive tuning for efficient multi-stage LLM-based Text-to-SQL inference on heterogeneous GPU clusters.
Findings
Reduces P95 tail latency by 1.42 to 1.56 times
Increases throughput by 1.49 to 1.81 times
Outperforms existing LLM serving frameworks on realistic benchmarks
Abstract
Recent advances in agentic large language models (LLMs) have substantially improved Text-to-SQL, enabling users without database expertise to query databases intuitively. However, deploying agentic LLM-based Text-to-SQL systems in production remains challenging due to multi-stage dependencies, strict latency requirements, and deployment complexity across heterogeneous GPUs in enterprise clusters. Existing LLM serving frameworks are designed mainly for independent inference tasks, leading to suboptimal performance and frequent service-level objective (SLO) violations for Text-to-SQL workloads. In this paper, we introduce \sys, a framework for scheduling and executing agentic multi-stage LLM-based Text-to-SQL workflows on heterogeneous GPU clusters serving multi-tenant requests. \sys adopts a hierarchical scheduler that combines global workload-balanced task dispatching with an adaptive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Natural Language Processing Techniques · Big Data and Digital Economy
