UniVer: A Unified Perspective for Multi-step and Multi-draft Speculative Decoding
Yepeng Weng, Qiao Hu, Takehisa Yairi

TL;DR
UniVer introduces a unified OT-based verification method for speculative decoding that jointly optimizes multi-step and multi-draft aspects, improving efficiency and maintaining model fidelity.
Contribution
It proposes a novel conditional OT framework and algorithm that jointly optimizes tree-based verification in speculative decoding, unifying previous isolated approaches.
Findings
UniVer increases acceptance length by 4.2% to 8.5%.
It maintains exact distributional alignment with the target model.
The method is effective across various tasks and models.
Abstract
Speculative decoding accelerates Large Language Models via draft-then-verify, where verification can be framed as an Optimal Transport (OT) problem. Existing approaches typically handle multi-draft and multi-step aspects in isolation, applying either flat OT to single-step drafts or per-token rejection sampling to tree-structured candidates. This separation leaves the joint regime (where multi-step dependencies meet multi-draft branching) poorly optimized, as local verification rules fail to exploit the coupling between horizontal and vertical dimensions of candidate trees. In this paper, we propose a unified perspective that casts tree-based verification as a conditional OT problem. Our key insight is that vertical dependencies can be abstracted through prefix acceptance probabilities, which act as dynamic scaling factors to actively guide horizontal draft selection. Based on this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
