LLM-dCache: Improving Tool-Augmented LLMs with GPT-Driven Localized Data   Caching

Simranjit Singh; Michael Fore; Andreas Karatzas; Chaehong Lee; Yanan; Jian; Longfei Shangguan; Fuxun Yu; Iraklis Anagnostopoulos; Dimitrios; Stamoulis

arXiv:2406.06799·cs.DC·September 24, 2024

LLM-dCache: Improving Tool-Augmented LLMs with GPT-Driven Localized Data Caching

Simranjit Singh, Michael Fore, Andreas Karatzas, Chaehong Lee, Yanan, Jian, Longfei Shangguan, Fuxun Yu, Iraklis Anagnostopoulos, Dimitrios, Stamoulis

PDF

Open Access

TL;DR

This paper presents LLM-dCache, a method that enables Large Language Models to autonomously manage data caching through API calls, significantly reducing data access overhead in large-scale, tool-augmented environments.

Contribution

Introducing LLM-dCache, a novel approach that allows LLMs to control cache operations via prompting, enhancing data access efficiency in large-scale systems.

Findings

01

Improves Copilot response times by 1.24x on average.

02

Effectively manages cache operations through LLM prompting.

03

Demonstrates scalability on industry-scale platforms.

Abstract

As Large Language Models (LLMs) broaden their capabilities to manage thousands of API calls, they are confronted with complex data operations across vast datasets with significant overhead to the underlying system. In this work, we introduce LLM-dCache to optimize data accesses by treating cache operations as callable API functions exposed to the tool-augmented agent. We grant LLMs the autonomy to manage cache decisions via prompting, seamlessly integrating with existing function-calling mechanisms. Tested on an industry-scale massively parallel platform that spans hundreds of GPT endpoints and terabytes of imagery, our method improves Copilot times by an average of 1.24x across various LLMs and prompting techniques.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Storage Technologies · Digital Rights Management and Security

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Byte Pair Encoding · Adam · Attention Dropout · Linear Layer · Multi-Head Attention · Dropout · Dense Connections · Cosine Annealing