Agentic Plan Caching: Test-Time Memory for Fast and Cost-Efficient LLM Agents

Qizheng Zhang; Michael Wornow; Gerry Wan; Kunle Olukotun

arXiv:2506.14852·cs.DC·January 28, 2026

Agentic Plan Caching: Test-Time Memory for Fast and Cost-Efficient LLM Agents

Qizheng Zhang, Michael Wornow, Gerry Wan, Kunle Olukotun

PDF

Open Access

TL;DR

Agentic Plan Caching (APC) introduces a test-time memory system that extracts and reuses structured plan templates from agent executions, significantly reducing costs and latency in LLM-based agents while maintaining performance.

Contribution

This paper presents APC, a novel caching method that extracts, matches, and adapts plan templates at test-time, addressing limitations of existing caching techniques for agent applications.

Findings

01

Cost reduced by 50.31% on average.

02

Latency decreased by 27.28% on average.

03

Maintains performance while improving efficiency.

Abstract

LLM-based agent applications have shown increasingly remarkable capabilities in complex workflows but incur substantial costs and latency due to extensive planning and reasoning requirements. Existing LLM caching techniques (like context caching and semantic caching), primarily designed for serving chatbots, are insufficient for agent applications where outputs depend on external data and environmental contexts. We propose Agentic Plan Caching (APC), a novel test-time memory that extracts, stores, adapts, and reuses structured plan templates from planning stages of agent applications across semantically similar tasks to reduce the cost and latency of serving. Unlike traditional semantic caching, our system extracts plan templates from completed agent executions at test-time, employs keyword extraction to match new requests against cached plans, and utilizes lightweight models to adapt…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMobile Agent-Based Network Management · Distributed systems and fault tolerance · Caching and Content Delivery