GoodSpeed: Optimizing Fair Goodput with Adaptive Speculative Decoding in Distributed Edge Inference
Phuong Tran, Tzu-Hao Liu, Long Tan Le, Tung-Anh Nguyen, Van Quan La, Eason Yu, Han Shu, Choong Seon Hong, Nguyen H. Tran

TL;DR
GoodSpeed is a distributed inference framework that enhances the efficiency and fairness of speculative decoding for large language models by dynamically coordinating multiple draft servers with a central verification server.
Contribution
It introduces a novel adaptive speculative decoding framework with a gradient scheduling algorithm to optimize goodput and fairness in distributed LLM inference.
Findings
Converges to optimal goodput allocation in steady state
Maintains near-optimal performance under dynamic workloads
Provides scalable and fair distributed inference for LLMs
Abstract
Large language models (LLMs) have revolutionized natural language processing, yet their high computational demands pose significant challenges for real-time inference, especially in multi-user server speculative decoding and resource-constrained environments. Speculative decoding has emerged as a promising technique to accelerate LLM inference by using lightweight draft models to generate candidate tokens, which are subsequently verified by a larger, more accurate model. However, ensuring both high goodput (the effective rate of accepted tokens) and fairness across multiple draft servers cooperating with a central verification server remains an open challenge. This paper introduces GOODSPEED, a novel distributed inference framework that optimizes goodput through adaptive speculative decoding. GOODSPEED employs a central verification server that coordinates a set of heterogeneous draft…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Big Data and Digital Economy · Advanced Neural Network Applications
