Swarm: Co-Activation Aware KVCache Offloading Across Multiple SSDs

Tuowei Wang; Liyun Chu; Ruwen Fan; Ju Ren

arXiv:2603.17803·cs.PF·March 19, 2026

Swarm: Co-Activation Aware KVCache Offloading Across Multiple SSDs

Tuowei Wang, Liyun Chu, Ruwen Fan, Ju Ren

PDF

Open Access

TL;DR

Swarm is a system that exploits the stable co-activation patterns of KVCache entries in large language models to efficiently offload cache data across multiple SSDs, significantly improving I/O performance.

Contribution

It introduces the concept of KVCache Co-Activation and develops Swarm, a multi-SSD offloading system that enhances parallel I/O and bandwidth utilization for LLM inference workloads.

Findings

01

Reduces I/O time by 2.41x

02

Improves bandwidth utilization by 2.72x

03

Effectively adapts to evolving access patterns

Abstract

The key-value (KV) cache has become the dominant contributor to memory consumption in large language model (LLM) inference. Although offloading KVCache from GPU high-bandwidth memory (HBM) to CPU DRAM alleviates device memory pressure, DRAM remains capacity-limited and costly for large, persistent workloads. Solid-state drives (SSDs) provide a cost-effective alternative, but naive SSD-based paging is fundamentally bandwidth-bound due to limited PCIe throughput and per-device bandwidth constraints. In this paper, we observe that KVCache activations in real-world workloads exhibit strong and stable correlations. We term this phenomenon KVCache Co-Activation, where accessing a KV entry is often accompanied by a stable and recurring set of other KV entries. Leveraging this property, we present Swarm, an SSD-based KVCache offloading system that converts bandwidth-bound single-device access…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Storage Technologies · Parallel Computing and Optimization Techniques · Cloud Computing and Resource Management