Don't Break the Cache: An Evaluation of Prompt Caching for Long-Horizon Agentic Tasks

Elias Lumer; Faheem Nizar; Akshaya Jangiti; Kevin Frank; Anmol Gulati; Mandar Phadate; Vamse Kumar Subbiah

arXiv:2601.06007·cs.CL·February 3, 2026

Don't Break the Cache: An Evaluation of Prompt Caching for Long-Horizon Agentic Tasks

Elias Lumer, Faheem Nizar, Akshaya Jangiti, Kevin Frank, Anmol Gulati, Mandar Phadate, Vamse Kumar Subbiah

PDF

Open Access

TL;DR

This paper evaluates prompt caching strategies for large language model agents executing multi-turn, tool-using tasks, demonstrating significant cost and latency reductions and providing practical guidance for deployment.

Contribution

It offers the first comprehensive analysis of prompt caching benefits and strategies for agentic LLM workloads across multiple providers and configurations.

Findings

01

Prompt caching reduces API costs by 41-80%.

02

Time to first token improves by 13-31%.

03

Strategic cache placement outperforms naive full-context caching.

Abstract

Recent advancements in Large Language Model (LLM) agents have enabled complex multi-turn agentic tasks requiring extensive tool calling, where conversations can span dozens of API calls with increasingly large context windows. However, although major LLM providers offer prompt caching to reduce cost and latency, its benefits for agentic workloads remain underexplored in the research literature. To our knowledge, no prior work quantifies these cost savings or compares caching strategies for multi-turn agentic tasks. We present a comprehensive evaluation of prompt caching across three major LLM providers (OpenAI, Anthropic, and Google) and compare three caching strategies, including full context caching, system prompt only caching, and caching that excludes dynamic tool results. We evaluate on DeepResearch Bench, a multi-turn agentic benchmark where agents autonomously execute real-world…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Web Data Mining and Analysis · Mobile Crowdsensing and Crowdsourcing