TokenDance: Scaling Multi-Agent LLM Serving via Collective KV Cache Sharing

Zhuohang Bian; Feiyang Wu; Chengrui Zhang; Hangcheng Dong; Yun Liang; and Youwei Zhuo

arXiv:2604.03143·cs.DC·April 6, 2026

TokenDance: Scaling Multi-Agent LLM Serving via Collective KV Cache Sharing

Zhuohang Bian, Feiyang Wu, Chengrui Zhang, Hangcheng Dong, Yun Liang, and Youwei Zhuo

PDF

TL;DR

TokenDance introduces a scalable system for multi-agent LLM serving that efficiently shares KV Cache through collective reuse, significantly reducing storage and improving concurrency.

Contribution

It presents TokenDance, a novel KV Cache sharing system that exploits the All-Gather pattern for collective reuse, enabling higher concurrency and storage efficiency.

Findings

01

Supports up to 2.7x more concurrent agents than vLLM.

02

Reduces per-agent KV Cache storage by up to 17.5x.

03

Achieves up to 1.9x prefill speedup over per-request caching.

Abstract

Multi-agent LLM applications organize execution in synchronized rounds where a central scheduler gathers outputs from all agents and redistributes the combined context. This All-Gather communication pattern creates massive KV Cache redundancy, because every agent's prompt contains the same shared output blocks, yet existing reuse methods fail to exploit it efficiently. We present TokenDance, a system that scales the number of concurrent agents by exploiting the All-Gather pattern for collective KV Cache sharing. TokenDance's KV Collector performs KV Cache reuse over the full round in one collective step, so the cost of reusing a shared block is paid once regardless of agent count. Its Diff-Aware Storage encodes sibling caches as block-sparse diffs against a single master copy, achieving 11-17x compression on representative workloads. Evaluation on GenerativeAgents and AgentSociety shows…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.