LLM-dCache: Improving Tool-Augmented LLMs with GPT-Driven Localized Data Caching
Simranjit Singh, Michael Fore, Andreas Karatzas, Chaehong Lee, Yanan, Jian, Longfei Shangguan, Fuxun Yu, Iraklis Anagnostopoulos, Dimitrios, Stamoulis

TL;DR
This paper presents LLM-dCache, a method that enables Large Language Models to autonomously manage data caching through API calls, significantly reducing data access overhead in large-scale, tool-augmented environments.
Contribution
Introducing LLM-dCache, a novel approach that allows LLMs to control cache operations via prompting, enhancing data access efficiency in large-scale systems.
Findings
Improves Copilot response times by 1.24x on average.
Effectively manages cache operations through LLM prompting.
Demonstrates scalability on industry-scale platforms.
Abstract
As Large Language Models (LLMs) broaden their capabilities to manage thousands of API calls, they are confronted with complex data operations across vast datasets with significant overhead to the underlying system. In this work, we introduce LLM-dCache to optimize data accesses by treating cache operations as callable API functions exposed to the tool-augmented agent. We grant LLMs the autonomy to manage cache decisions via prompting, seamlessly integrating with existing function-calling mechanisms. Tested on an industry-scale massively parallel platform that spans hundreds of GPT endpoints and terabytes of imagery, our method improves Copilot times by an average of 1.24x across various LLMs and prompting techniques.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Digital Rights Management and Security
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Byte Pair Encoding · Adam · Attention Dropout · Linear Layer · Multi-Head Attention · Dropout · Dense Connections · Cosine Annealing
