TENT: A Declarative Slice Spraying Engine for Performant and Resilient Data Movement in Disaggregated LLM Serving

Feng Ren; Ruoyu Qin; Teng Ma; Shangming Cai; Zheng Liu; Chao Lei; Dejiang Zhu; Ke Yang; Zheming Li; Jialei Cui; Weixiao Huang; Yikai Zhao; Yineng Zhang; Hao Wu; Xiang Gao; Yuhao Fu; Jinlei Jiang; Yongwei Wu; Mingxing Zhang

arXiv:2604.00368·cs.DC·April 2, 2026

TENT: A Declarative Slice Spraying Engine for Performant and Resilient Data Movement in Disaggregated LLM Serving

Feng Ren, Ruoyu Qin, Teng Ma, Shangming Cai, Zheng Liu, Chao Lei, Dejiang Zhu, Ke Yang, Zheming Li, Jialei Cui, Weixiao Huang, Yikai Zhao, Yineng Zhang, Hao Wu, Xiang Gao, Yuhao Fu, Jinlei Jiang, Yongwei Wu, Mingxing Zhang

PDF

TL;DR

TENT is a dynamic, telemetry-driven data movement engine for disaggregated GPU clusters that improves throughput, resilience, and operational efficiency in LLM serving by decoupling transfer intent from physical execution.

Contribution

TENT introduces a flexible, telemetry-driven approach to data transfer, unifying heterogeneous links and enabling dynamic slicing and rerouting for improved performance and fault tolerance.

Findings

01

TENT achieves up to 1.36x higher throughput in LLM inference.

02

TENT reduces P90 TTFT by 26% compared to Mooncake TE.

03

TENT accelerates RL parameter updates by 20-26%.

Abstract

Modern GPU clusters are built upon a complex hierarchy of heterogeneous interconnects, ranging from multi-rail RDMA to proprietary fabrics such as Multi-Node NVLink and Ascend UB. Orchestrating these diverse links effectively remains a critical challenge in disaggregated LLM serving. Operating Mooncake TE on thousands of GPUs exposed a critical limitation shared by existing frameworks: imperative, statically bound path selection. This rigidity forces engines to rely on state-blind striping that ignores congestion signals, creating communication silos, wasting multi-rail bandwidth due to head-of-line blocking, and leading to operational fragility where routine faults require manual intervention. We present TENT, a data-movement engine that decouples transfer intent from physical execution. Instead of locking workloads to fixed backends, TENT unifies heterogeneous interconnects into a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.