DeServe: Towards Affordable Offline LLM Inference via Decentralization

Linyu Wu; Xiaoyuan Liu; Tianneng Shi; Zhe Ye; Dawn Song

arXiv:2501.14784·cs.DC·January 28, 2025

DeServe: Towards Affordable Offline LLM Inference via Decentralization

Linyu Wu, Xiaoyuan Liu, Tianneng Shi, Zhe Ye, Dawn Song

PDF

Open Access

TL;DR

DeServe is a decentralized offline system that leverages idle GPU resources to reduce costs and improve throughput for large language model inference, especially in high-latency network environments.

Contribution

This paper introduces DeServe, a novel decentralized system that enables affordable, high-throughput offline LLM inference by utilizing idle GPU resources in a network-aware manner.

Findings

01

Achieves 6.7x-12.6x throughput improvement over baselines.

02

Effectively utilizes idle GPU resources for cost reduction.

03

Optimized for high-latency network environments.

Abstract

The rapid growth of generative AI and its integration into everyday workflows have significantly increased the demand for large language model (LLM) inference services. While proprietary models remain popular, recent advancements in open-source LLMs have positioned them as strong contenders. However, deploying these models is often constrained by the high costs and limited availability of GPU resources. In response, this paper presents the design of a decentralized offline serving system for LLM inference. Utilizing idle GPU resources, our proposed system, DeServe, decentralizes access to LLMs at a lower cost. DeServe specifically addresses key challenges in optimizing serving throughput in high-latency network environments. Experiments demonstrate that DeServe achieves a 6.7x-12.6x improvement in throughput over existing serving system baselines in such conditions.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImbalanced Data Classification Techniques · Artificial Intelligence in Law