HALO: Semantic-Aware Distributed LLM Inference in Lossy Edge Network

Peirong Zheng; Wenchao Xu; Haozhao Wang; Jinyu Chen; Xuemin Shen

arXiv:2601.11676·cs.DC·January 21, 2026

HALO: Semantic-Aware Distributed LLM Inference in Lossy Edge Network

Peirong Zheng, Wenchao Xu, Haozhao Wang, Jinyu Chen, Xuemin Shen

PDF

Open Access

TL;DR

HALO is a framework that enhances distributed large language model inference at the edge by using semantic-aware synchronization and load balancing, achieving significant speedups despite lossy network conditions.

Contribution

HALO introduces a semantic-aware predictor, parallel neuron loading, and load balancing to improve distributed LLM inference in unreliable edge networks, reducing synchronization delays.

Findings

01

3.41x end-to-end speedup on Raspberry Pi cluster

02

Maintains performance comparable to ideal conditions

03

Outperforms existing methods in lossy network scenarios

Abstract

The deployment of large language models' (LLMs) inference at the edge can facilitate prompt service responsiveness while protecting user privacy. However, it is critically challenged by the resource constraints of a single edge node. Distributed inference has emerged to aggregate and leverage computational resources across multiple devices. Yet, existing methods typically require strict synchronization, which is often infeasible due to the unreliable network conditions. In this paper, we propose HALO, a novel framework that can boost the distributed LLM inference in lossy edge network. The core idea is to enable a relaxed yet effective synchronization by strategically allocating less critical neuron groups to unstable devices, thus avoiding the excessive waiting time incurred by delayed packets. HALO introduces three key mechanisms: (1) a semantic-aware predictor to assess the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIoT and Edge/Fog Computing · Privacy-Preserving Technologies in Data · IoT Networks and Protocols