Pancake: Hierarchical Memory System for Multi-Agent LLM Serving
Zhengding Hu, Zaifeng Pan, Prabhleen Kaur, Vibha Murthy, Zhongkai Yu, Yue Guan, Zhen Wang, Steven Swanson, Yufei Ding

TL;DR
Pancake is a hierarchical memory system designed for multi-agent LLM serving that improves efficiency and throughput by unifying multi-level caching, coordinated index management, and GPU-CPU acceleration.
Contribution
It introduces a novel multi-tier memory architecture with integrated techniques for managing complex agentic memory in large-scale LLM serving environments.
Findings
Over 4.29x throughput improvement over existing frameworks
Effective management of large-scale, dynamic agent memory
Compatibility with popular agent frameworks like LangChain and LlamaIndex
Abstract
In this work, we identify and address the core challenges of agentic memory management in LLM serving, where large-scale storage, frequent updates, and multiple coexisting agents jointly introduce complex and high-cost approximate nearest neighbor (ANN) searching problems. We present Pancake, a multi-tier agentic memory system that unifies three key techniques: (i) multi-level index caching for single agents, (ii) coordinated index management across multiple agents, and (iii) collaborative GPU-CPU acceleration. Pancake exposes easy-to-use interface that can be integrated into memory-based agents like Mem-GPT, and is compatible with agentic frameworks such as LangChain and LlamaIndex. Experiments on realistic agent workloads show that Pancake substantially outperforms existing frameworks, achieving more than 4.29x end-to-end throughput improvement.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNetwork Packet Processing and Optimization · Graph Theory and Algorithms · Multi-Agent Systems and Negotiation
