ToolCaching: Towards Efficient Caching for LLM Tool-calling

Yi Zhai; Dian Shen; Junzhou Luo; Bin Yang

arXiv:2601.15335·cs.SE·January 23, 2026

ToolCaching: Towards Efficient Caching for LLM Tool-calling

Yi Zhai, Dian Shen, Junzhou Luo, Bin Yang

PDF

Open Access

TL;DR

This paper introduces ToolCaching, an adaptive caching framework for LLM tool-calling that improves cache efficiency by considering semantic and system features, leading to higher hit ratios and lower latency.

Contribution

It presents a novel feature-driven, adaptive caching framework with the VAAC algorithm, tailored for heterogeneous and dynamic LLM tool-calling workloads.

Findings

01

Up to 11% higher cache hit ratio

02

34% lower latency

03

Effective acceleration of LLM tool-calling applications

Abstract

Recent advances in Large Language Models (LLMs) have revolutionized web applications, enabling intelligent search, recommendation, and assistant services with natural language interfaces. Tool-calling extends LLMs with the ability to interact with external APIs, greatly enhancing their practical utility. While prior research has improved tool-calling performance by adopting traditional computer systems techniques, such as parallel and asynchronous execution, the challenge of redundant or repeated tool-calling requests remains largely unaddressed. Caching is a classic solution to this problem, but applying it to LLM tool-calling introduces new difficulties due to heterogeneous request semantics, dynamic workloads, and varying freshness requirements, which render conventional cache policies ineffective. To address these issues, we propose ToolCaching, an efficient feature-driven and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCaching and Content Delivery · Cloud Computing and Resource Management · Big Data and Digital Economy