Agentic Plan Caching: Test-Time Memory for Fast and Cost-Efficient LLM Agents
Qizheng Zhang, Michael Wornow, Gerry Wan, Kunle Olukotun

TL;DR
Agentic Plan Caching (APC) introduces a test-time memory system that extracts and reuses structured plan templates from agent executions, significantly reducing costs and latency in LLM-based agents while maintaining performance.
Contribution
This paper presents APC, a novel caching method that extracts, matches, and adapts plan templates at test-time, addressing limitations of existing caching techniques for agent applications.
Findings
Cost reduced by 50.31% on average.
Latency decreased by 27.28% on average.
Maintains performance while improving efficiency.
Abstract
LLM-based agent applications have shown increasingly remarkable capabilities in complex workflows but incur substantial costs and latency due to extensive planning and reasoning requirements. Existing LLM caching techniques (like context caching and semantic caching), primarily designed for serving chatbots, are insufficient for agent applications where outputs depend on external data and environmental contexts. We propose Agentic Plan Caching (APC), a novel test-time memory that extracts, stores, adapts, and reuses structured plan templates from planning stages of agent applications across semantically similar tasks to reduce the cost and latency of serving. Unlike traditional semantic caching, our system extracts plan templates from completed agent executions at test-time, employs keyword extraction to match new requests against cached plans, and utilizes lightweight models to adapt…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMobile Agent-Based Network Management · Distributed systems and fault tolerance · Caching and Content Delivery
