Distributed On-Device LLM Inference With Over-the-Air Computation

Kai Zhang; Hengtao He; Shenghui Song; Jun Zhang; and Khaled B. Letaief

arXiv:2502.12559·cs.DC·February 19, 2025

Distributed On-Device LLM Inference With Over-the-Air Computation

Kai Zhang, Hengtao He, Shenghui Song, Jun Zhang, and Khaled B. Letaief

PDF

Open Access

TL;DR

This paper introduces a distributed on-device LLM inference framework that uses tensor parallelism and over-the-air computation to reduce latency and communication overhead on edge devices.

Contribution

It proposes a novel over-the-air computation method combined with joint model and transceiver optimization for efficient distributed LLM inference.

Findings

01

Significantly reduces inference latency.

02

Improves inference accuracy.

03

Enables practical deployment of LLMs on resource-constrained devices.

Abstract

Large language models (LLMs) have achieved remarkable success across various artificial intelligence tasks. However, their enormous sizes and computational demands pose significant challenges for the deployment on edge devices. To address this issue, we present a distributed on-device LLM inference framework based on tensor parallelism, which partitions neural network tensors (e.g., weight matrices) of LLMs among multiple edge devices for collaborative inference. Nevertheless, tensor parallelism involves frequent all-reduce operations to aggregate intermediate layer outputs across participating devices during inference, resulting in substantial communication overhead. To mitigate this bottleneck, we propose an over-the-air computation method that leverages the analog superposition property of wireless multiple-access channels to facilitate fast all-reduce operations. To minimize the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReal-time simulation and control systems · Medical Imaging Techniques and Applications · Image and Signal Denoising Methods