WISV: Wireless-Informed Semantic Verification for Distributed Speculative Decoding in Device-Edge LLM Inference

Zixuan Liu; Zhiyong Chen; Nan Xue; Shengkang Chen; Jiangchao Yao; Meixia Tao; and Wenjun Zhang

arXiv:2604.17701·cs.IT·April 21, 2026

WISV: Wireless-Informed Semantic Verification for Distributed Speculative Decoding in Device-Edge LLM Inference

Zixuan Liu, Zhiyong Chen, Nan Xue, Shengkang Chen, Jiangchao Yao, Meixia Tao, and Wenjun Zhang

PDF

TL;DR

WISV introduces a wireless-aware semantic verification method for distributed speculative decoding in device-edge LLM inference, significantly improving efficiency and latency under wireless conditions.

Contribution

The paper proposes WISV, a novel semantic verification framework that surpasses token-level matching by integrating channel-aware evaluation and tailored communication protocols.

Findings

01

Up to 60.8% increase in accepted sequence length.

02

37.3% reduction in interaction rounds.

03

31.4% improvement in end-to-end latency.

Abstract

While distributed device-edge speculative decoding enhances resource utilization across heterogeneous nodes, its performance is often bottlenecked by conventional token-level verification strategies. Such rigid alignment leads to excessive rejections, significantly diminishing the accepted sequence length and increasing interaction rounds under fluctuating wireless conditions. In this paper, we propose WISV (Wireless-Informed Semantic Verification), a novel distributed speculative decoding framework that goes beyond strict token-level matching via a channel-aware semantic acceptance policy. WISV integrates a lightweight decision head into the edge-side target LLM to dynamically evaluate speculative tokens by synthesizing high-dimensional hidden representations with instantaneous channel state information (CSI). To optimize the trade-off between verification fidelity and communication…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.