An Empirical Study of Speculative Decoding on Software Engineering Tasks
Yijia Li, Junkai Chen, Xing Hu, Xin Xia

TL;DR
This paper empirically evaluates Speculative Decoding's effectiveness in accelerating Large Language Model inference for Software Engineering tasks, highlighting its variable success across different scenarios and model sizes.
Contribution
It provides the first systematic benchmarking of SD strategies in SE tasks, offering practical guidelines for inference acceleration.
Findings
SD accelerates inference more for smaller models.
Model-based SD suits code generation; model-free SD suits repair/editing.
Repetitiveness in SE tasks enhances model-free SD performance.
Abstract
Large Language Models (LLMs) have become widely used for Software Engineering (SE) tasks, spanning from function-level code generation to complex repository-level workflows. However, the high latency of autoregressive inference remains a significant bottleneck, hindering their deployment in interactive environments. While Speculative Decoding (SD) offers a promising technique for lossless acceleration, prior research on long-context repository-level tasks and complex agentic interactions remains limited. To bridge this gap, we present the first systematic empirical study to evaluate the effectiveness of SD in SE tasks. We systematically benchmark a comprehensive spectrum of strategies, encompassing both model-based and model-free methods, across representative generation, editing, and repair scenarios. Our empirical results indicate that SD demonstrates clear potential for accelerating…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
