KVFlow: Efficient Prefix Caching for Accelerating LLM-Based Multi-Agent Workflows

Zaifeng Pan; Ajjkumar Patel; Zhengding Hu; Yipeng Shen; Yue Guan; Wan-Lu Li; Lianhui Qin; Yida Wang; Yufei Ding

arXiv:2507.07400·cs.DC·July 11, 2025

KVFlow: Efficient Prefix Caching for Accelerating LLM-Based Multi-Agent Workflows

Zaifeng Pan, Ajjkumar Patel, Zhengding Hu, Yipeng Shen, Yue Guan, Wan-Lu Li, Lianhui Qin, Yida Wang, Yufei Ding

PDF

Open Access 1 Video

TL;DR

KVFlow is a workflow-aware cache management framework that significantly improves the efficiency of LLM-based multi-agent workflows by predicting agent usage and proactively prefetching key-value tensors.

Contribution

KVFlow introduces a novel agent step graph-based cache eviction policy and a proactive prefetching mechanism tailored for multi-agent LLM workflows.

Findings

01

Achieves up to 1.83× speedup for single workflows with large prompts.

02

Achieves up to 2.19× speedup in multi-workflow scenarios.

03

Reduces cache misses and recomputation overhead.

Abstract

Large language model (LLM) based agentic workflows have become a popular paradigm for coordinating multiple specialized agents to solve complex tasks. To improve serving efficiency, existing LLM systems employ prefix caching to reuse key-value (KV) tensors corresponding to agents' fixed prompts, thereby avoiding redundant computation across repeated invocations. However, current systems typically evict KV caches using a Least Recently Used (LRU) policy, which fails to anticipate future agent usage and often discards KV caches shortly before their reuse. This leads to frequent cache misses and substantial recomputation or swapping overhead. We present KVFlow, a workflow-aware KV cache management framework tailored for agentic workloads. KVFlow abstracts the agent execution schedule as an Agent Step Graph and assigns each agent a steps-to-execution value that estimates its temporal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

KVFlow: Efficient Prefix Caching for Accelerating LLM-Based Multi-Agent Workflows· slideslive

Taxonomy

TopicsBig Data and Digital Economy · Scientific Computing and Data Management · Multi-Agent Systems and Negotiation