TraCT: Disaggregated LLM Serving with CXL Shared Memory KV Cache at Rack-Scale

Dongha Yoon; Younghoon Min; Hoshik Kim; Sam H. Noh; and Jongryool Kim

arXiv:2512.18194·cs.DC·December 23, 2025

TraCT: Disaggregated LLM Serving with CXL Shared Memory KV Cache at Rack-Scale

Dongha Yoon, Younghoon Min, Hoshik Kim, Sam H. Noh, and Jongryool Kim

PDF

Open Access

TL;DR

TraCT leverages CXL shared memory for disaggregated LLM serving, significantly reducing latency and increasing throughput by eliminating network bottlenecks in KV transfer.

Contribution

This paper introduces TraCT, a novel rack-scale LLM serving system using CXL shared memory for direct KV access, addressing synchronization and consistency challenges.

Findings

01

Up to 9.8x reduction in average TTFT

02

Up to 6.2x lower P99 latency

03

Up to 1.6x peak throughput improvement

Abstract

Disaggregated LLM serving improves resource efficiency by separating the compute-intensive prefill phase from the latency-critical decode phase. However, this architecture introduces a fundamental bottleneck: key/value (KV) tensors generated during prefill must be transferred to decode workers, and existing systems rely on RDMA-based network paths for this exchange. As model sizes and context lengths increase, KV transfer dominates both time-to-first-token (TTFT) and peak throughput, and remains highly sensitive to network contention even when prefix reuse is high. This paper presents TraCT, a rack-scale LLM serving system that uses CXL shared memory as both a KV-transfer substrate and a rack-wide prefix-aware KV cache. TraCT enables GPUs to write and read KV blocks directly through CXL load/store and DMA operations, eliminating the NIC hop that constrains existing disaggregated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Advanced Data Storage Technologies · Network Packet Processing and Optimization