EdgeShard: Efficient LLM Inference via Collaborative Edge Computing

Mingjin Zhang; Jiannong Cao; Xiaoming Shen; Zeyang Cui

arXiv:2405.14371·cs.DC·May 24, 2024·1 cites

EdgeShard: Efficient LLM Inference via Collaborative Edge Computing

Mingjin Zhang, Jiannong Cao, Xiaoming Shen, Zeyang Cui

PDF

Open Access

TL;DR

EdgeShard introduces a collaborative edge computing framework that partitions large language models across devices and cloud to reduce latency and increase throughput, addressing privacy and bandwidth issues.

Contribution

The paper proposes a novel model partitioning and device selection framework for efficient LLM inference on edge-cloud systems, with an adaptive optimization algorithm.

Findings

01

Achieves up to 50% latency reduction

02

Doubles throughput compared to baseline methods

03

Demonstrates effectiveness on Llama2 models

Abstract

Large language models (LLMs) have shown great potential in natural language processing and content generation. However, current LLMs heavily rely on cloud computing, leading to prolonged latency, high bandwidth cost, and privacy concerns. Edge computing is promising to address such concerns by deploying LLMs on edge devices, closer to data sources. Some works try to leverage model quantization to reduce the model size to fit the resource-constraint edge devices, but they lead to accuracy loss. Other works use cloud-edge collaboration, suffering from unstable network connections. In this work, we leverage collaborative edge computing to facilitate the collaboration among edge devices and cloud servers for jointly performing efficient LLM inference. We propose a general framework to partition the LLM model into shards and deploy on distributed devices. To achieve efficient LLM inference,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBlockchain Technology Applications and Security · Digital Rights Management and Security · Cloud Data Security Solutions