Model-Distributed Inference for Large Language Models at the Edge

Davide Macario; Hulya Seferoglu; Erdem Koyuncu

arXiv:2505.18164·cs.LG·May 27, 2025

Model-Distributed Inference for Large Language Models at the Edge

Davide Macario, Hulya Seferoglu, Erdem Koyuncu

PDF

Open Access

TL;DR

This paper presents MDI-LLM, a framework that enables large language models to run efficiently across multiple edge devices by partitioning models and using collaborative inference techniques, thus overcoming hardware limitations.

Contribution

The paper introduces a novel distributed inference framework with recurrent pipeline parallelism for deploying large models on low-power edge devices.

Findings

01

Enables deployment of larger LLMs on edge devices.

02

Improves inference throughput with multiple devices.

03

Reduces memory usage per device.

Abstract

We introduce Model-Distributed Inference for Large-Language Models (MDI-LLM), a novel framework designed to facilitate the deployment of state-of-the-art large-language models (LLMs) across low-power devices at the edge. This is accomplished by dividing the model into multiple partitions, which are then assigned to different devices/nodes within the network. These nodes exchange intermediate activation vectors via device-to-device links, enabling collaborative computation. To enhance the efficiency of this process, we propose the "recurrent pipeline parallelism" technique, which reduces idle time on each device and facilitates parallel inference during the generation of multiple text sequences. By leveraging the combined computational resources of multiple edge devices, MDI-LLM enables the deployment of LLMs that exceed the memory capacity of individual devices, making it possible to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques