Performance-Driven Policy Optimization for Speculative Decoding with Adaptive Windowing

Jie Jiang; Xing Sun; Ruotian Chen; Jianan Su; Kaixin Shen

arXiv:2605.14978·cs.CL·May 18, 2026

Performance-Driven Policy Optimization for Speculative Decoding with Adaptive Windowing

Jie Jiang, Xing Sun, Ruotian Chen, Jianan Su, Kaixin Shen

PDF

TL;DR

This paper introduces PPOW, a reinforcement learning framework that optimizes speculative decoding at the window level, significantly improving inference speed and efficiency for large language models.

Contribution

PPOW shifts the focus from token-level to window-level optimization using reinforcement learning, enhancing speculative decoding performance.

Findings

01

Achieves average acceptance lengths of 6.29-6.52 tokens.

02

Realizes speedups of 3.39-4.36 times across multiple models.

03

Demonstrates practical window-level optimization improves decoding efficiency.

Abstract

Speculative decoding accelerates LLM inference by having a lightweight draft model propose speculative windows of candidate tokens for parallel verification by a larger target model. In practice, speculative efficiency is often bottlenecked by hard-to-draft positions, where an early mismatch truncates the accepted prefix and invalidates the rest of the speculative window. Most learning-based drafters are still optimized with token-level supervised objectives, even though speculative utility is inherently window-level and prefix-sensitive. We propose PPOW (Performance-Driven Policy Optimization with Adaptive Windowing), a reinforcement learning framework that shifts drafter optimization from token-level imitation to window-level optimization. PPOW combines a Cost-Aware Speedup Reward, a Distribution-Based Proximity Reward, and Adaptive Divergence-Aware Windowing, which prioritizes…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.