HedraRAG: Coordinating LLM Generation and Database Retrieval in Heterogeneous RAG Serving
Zhengding Hu, Vibha Murthy, Zaifeng Pan, Wanlu Li, Xiaoyi Fang, Yufei Ding, Yuke Wang

TL;DR
HedraRAG is a runtime system that optimizes heterogeneous RAG serving by dynamically transforming execution graphs, significantly improving resource utilization and reducing latency in complex multi-stage workflows.
Contribution
The paper introduces HedraRAG, a novel graph-based runtime system that coordinates LLM generation and database retrieval for efficient heterogeneous RAG serving.
Findings
Achieves speedups of 1.5x to 5x over existing frameworks.
Effectively exploits stage-level parallelism and request similarity.
Reduces latency in complex RAG workflows.
Abstract
This paper addresses emerging system-level challenges in heterogeneous retrieval-augmented generation (RAG) serving, where complex multi-stage workflows and diverse request patterns complicate efficient execution. We present HedraRAG, a runtime system built on a graph-based abstraction that exposes optimization opportunities across stage-level parallelism, intra-request similarity, and inter-request skewness. These opportunities are realized through dynamic graph transformations, such as node splitting, reordering, edge addition, and dependency rewiring, applied to wavefronts of subgraphs spanning concurrent requests. The resulting execution plans are mapped onto hybrid CPU-GPU pipelines to improve resource utilization and reduce latency. Evaluations across a wide range of RAG workflows demonstrate speedups exceeding 1.5x and reaching up to 5x over existing frameworks, showcasing the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Computational Techniques and Applications · Algorithms and Data Compression · Environmental Monitoring and Data Management
