ToolCaching: Towards Efficient Caching for LLM Tool-calling
Yi Zhai, Dian Shen, Junzhou Luo, Bin Yang

TL;DR
This paper introduces ToolCaching, an adaptive caching framework for LLM tool-calling that improves cache efficiency by considering semantic and system features, leading to higher hit ratios and lower latency.
Contribution
It presents a novel feature-driven, adaptive caching framework with the VAAC algorithm, tailored for heterogeneous and dynamic LLM tool-calling workloads.
Findings
Up to 11% higher cache hit ratio
34% lower latency
Effective acceleration of LLM tool-calling applications
Abstract
Recent advances in Large Language Models (LLMs) have revolutionized web applications, enabling intelligent search, recommendation, and assistant services with natural language interfaces. Tool-calling extends LLMs with the ability to interact with external APIs, greatly enhancing their practical utility. While prior research has improved tool-calling performance by adopting traditional computer systems techniques, such as parallel and asynchronous execution, the challenge of redundant or repeated tool-calling requests remains largely unaddressed. Caching is a classic solution to this problem, but applying it to LLM tool-calling introduces new difficulties due to heterogeneous request semantics, dynamic workloads, and varying freshness requirements, which render conventional cache policies ineffective. To address these issues, we propose ToolCaching, an efficient feature-driven and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCaching and Content Delivery · Cloud Computing and Resource Management · Big Data and Digital Economy
