PerLLM: Personalized Inference Scheduling with Edge-Cloud Collaboration   for Diverse LLM Services

Zheming Yang; Yuanhao Yang; Chang Zhao; Qi Guo; Wenkai He; and Wen Ji

arXiv:2405.14636·cs.DC·May 24, 2024·6 cites

PerLLM: Personalized Inference Scheduling with Edge-Cloud Collaboration for Diverse LLM Services

Zheming Yang, Yuanhao Yang, Chang Zhao, Qi Guo, Wenkai He, and Wen Ji

PDF

Open Access

TL;DR

PerLLM is a personalized edge-cloud inference scheduling framework for diverse large language model services that optimizes resource allocation, reduces energy costs, and improves throughput under various constraints.

Contribution

It introduces a novel scheduling framework using an upper confidence bound algorithm for efficient edge-cloud collaboration in LLM services.

Findings

01

Achieves 2.2x, 2.1x, and 1.6x throughput improvements

02

Reduces energy costs by over 50%

03

Effectively meets personalized processing time requirements

Abstract

With the rapid growth in the number of large language model (LLM) users, it is difficult for bandwidth-constrained cloud servers to simultaneously process massive LLM services in real-time. Recently, edge-cloud infrastructures have been used to improve the processing efficiency of large-scale LLM services. However, the diversity of task requirements and the dynamics of resources pose great challenges to inference scheduling, leading to the wastage of many resources. In this paper, we present PerLLM, a personalized inference scheduling framework with edge-cloud collaboration designed for diverse LLM services. For the complexity of multiple constraints and the decision-making process of edge-cloud collaboration, we integrate the upper confidence bound algorithm based on the constraint satisfaction mechanism in PerLLM. For diverse LLM services, PerLLM can optimize service scheduling and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Scientific Computing and Data Management · IoT and Edge/Fog Computing