DSD: A Distributed Speculative Decoding Solution for Edge-Cloud Agile Large Model Serving
Fengze Yu, Leshu Li, Brad McDanel, Sai Qian Zhang

TL;DR
This paper introduces DSD, a distributed speculative decoding framework that enhances large language model inference across edge-cloud environments by reducing latency and increasing scalability through coordinated multi-device execution.
Contribution
The paper presents DSD, a novel distributed speculative decoding approach with a simulation tool and an adaptive window policy, enabling scalable LLM serving across heterogeneous edge and cloud systems.
Findings
DSD achieves up to 1.1x speedup over existing methods.
DSD improves throughput by 9.7% compared to baseline.
The framework effectively enables scalable LLM inference in edge-cloud settings.
Abstract
Large language model (LLM) inference often suffers from high decoding latency and limited scalability across heterogeneous edge-cloud environments. Existing speculative decoding (SD) techniques accelerate token generation but remain confined to single-node execution. We propose DSD, a distributed speculative decoding framework that extends SD to multi-device deployments through coordinated draft-target execution. Given the lack of prior work on simulating this paradigm, we first introduce DSD-Sim, a discrete-event simulator that captures network, batching, and scheduling dynamics. Building on insights from DSD-Sim, we further design an Adaptive Window Control (AWC) policy that dynamically adjusts speculation window size to optimize throughput. Experiments across diverse workloads show that DSD achieves up to 1.1x speedup and 9.7% higher throughput over existing SD baselines, enabling…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIoT and Edge/Fog Computing · Big Data and Digital Economy · Advanced Neural Network Applications
