VELO: A Vector Database-Assisted Cloud-Edge Collaborative LLM QoS   Optimization Framework

Zhi Yao; Zhiqing Tang; Jiong Lou; Ping Shen; Weijia Jia

arXiv:2406.13399·cs.AI·June 21, 2024·1 cites

VELO: A Vector Database-Assisted Cloud-Edge Collaborative LLM QoS Optimization Framework

Zhi Yao, Zhiqing Tang, Jiong Lou, Ping Shen, Weijia Jia

PDF

Open Access

TL;DR

The paper introduces VELO, a framework that uses vector database caching and multi-agent reinforcement learning to optimize LLM QoS at the edge, reducing delays and costs without modifying LLMs.

Contribution

VELO is a novel framework that leverages vector databases and MARL for cloud-edge LLM QoS optimization, applicable to diverse LLMs without internal modifications.

Findings

01

Significantly reduces response delay and resource consumption.

02

Enhances user satisfaction with improved QoS.

03

Effective in real edge system deployments.

Abstract

The Large Language Model (LLM) has gained significant popularity and is extensively utilized across various domains. Most LLM deployments occur within cloud data centers, where they encounter substantial response delays and incur high costs, thereby impacting the Quality of Services (QoS) at the network edge. Leveraging vector database caching to store LLM request results at the edge can substantially mitigate response delays and cost associated with similar requests, which has been overlooked by previous research. Addressing these gaps, this paper introduces a novel Vector database-assisted cloud-Edge collaborative LLM QoS Optimization (VELO) framework. Firstly, we propose the VELO framework, which ingeniously employs vector database to cache the results of some LLM requests at the edge to reduce the response time of subsequent similar requests. Diverging from direct optimization of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCloud Computing and Resource Management