ConfigSpec: Profiling-Based Configuration Selection for Distributed Edge--Cloud Speculative LLM Serving
Xiangchen Li, Saeid Ghafouri, Jiakun Fan, Babar Ali, Hans Vandierendonck, Dimitrios S. Nikolopoulos

TL;DR
ConfigSpec is a profiling-based framework for optimizing distributed speculative LLM serving across diverse edge devices and configurations, balancing throughput, cost, and energy efficiency.
Contribution
It introduces a profiling method to navigate the complex configuration space for distributed speculative LLM inference, revealing conflicting optimal points for different objectives.
Findings
Goodput maximized by smallest, fastest draft model at device-dependent lengths.
Cost and energy efficiency favor different draft sizes due to a bonus-token effect.
No single configuration optimizes all objectives, highlighting the need for profiling-based selection.
Abstract
Speculative decoding enables collaborative Large Language Model (LLM) inference across cloud and edge by separating lightweight token drafting from heavyweight verification. While prior systems show performance and cost benefits, practical deployment requires navigating a large configuration space spanning draft model variants, quantisation levels, speculative lengths, and heterogeneous edge devices. This paper presents ConfigSpec, a configurationselection framework for distributed speculative LLM serving. ConfigSpec profiles edge devices and draft-target alignment, and models drafting throughput, acceptance rate, and power to evaluate goodput, verification cost efficiency, and energy efficiency across the joint configuration space. Our analysis across three edge platforms and two LLM families reveals structurally conflicting optima. Firstly, goodput is maximised by the smallest,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
