DualPath: Breaking the Storage Bandwidth Bottleneck in Agentic LLM Inference
Yongtong Wu, Shaoyuan Chen, Yinmin Zhong, Rilin Huang, Yixuan Tan, Wentao Zhang, Liyue Zhang, Shangyan Zhou, Yuxuan Liu, Shunfeng Zhou, Mingxing Zhang, Xin Jin, Panpan Huang

TL;DR
DualPath significantly enhances agentic LLM inference throughput by introducing a dual-path KV-Cache loading system that alleviates storage bandwidth bottlenecks through optimized data transfer and dynamic load balancing.
Contribution
This paper introduces DualPath, a novel inference system that enables dual-path KV-Cache loading to overcome storage bandwidth limitations in agentic LLM inference.
Findings
Up to 1.87× increase in offline inference throughput.
Average 1.96× improvement in online serving throughput.
Maintains SLO compliance while boosting performance.
Abstract
The performance of multi-turn, agentic LLM inference is increasingly dominated by KV-Cache storage I/O rather than computation. In prevalent disaggregated architectures, loading the massive KV-Cache from external storage creates a fundamental imbalance: storage NICs on prefill engines become bandwidth-saturated, while those on decoding engines remain idle. This asymmetry severely constrains overall system throughput. We present DualPath, an inference system that breaks this bottleneck by introducing dual-path KV-Cache loading. Beyond the traditional storage-to-prefill path, DualPath enables a novel storage-to-decode path, in which the KV-Cache is loaded into decoding engines and then efficiently transferred to prefill engines via RDMA over the compute network. DualPath combines this optimized data path -- which inherently avoids network congestion and avoids interference with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Cloud Computing and Resource Management · Parallel Computing and Optimization Techniques
