Training-free Dropout Sampling for Semantic Token Acceptance in Speculative Decoding

Jeongtae Lee; Minjung Jo; Hyunjoon Jeong; Gunho Park; Sunghyeon Woo; Joonghoon Kim; Se Jung Kwon; Dongsoo Lee

arXiv:2603.03333·cs.CL·March 5, 2026

Training-free Dropout Sampling for Semantic Token Acceptance in Speculative Decoding

Jeongtae Lee, Minjung Jo, Hyunjoon Jeong, Gunho Park, Sunghyeon Woo, Joonghoon Kim, Se Jung Kwon, Dongsoo Lee

PDF

Open Access

TL;DR

DropMatch is a training-free, training-free, and calibration-free method for speculative decoding that improves token acceptance and inference speedup in large language models by matching draft tokens to the target model's predictive distribution using Monte Carlo dropout.

Contribution

It introduces DropMatch, a novel, training-free approach that enhances speculative decoding by matching draft tokens to the target model's distribution without requiring model training or calibration.

Findings

01

Increases acceptance length in decoding.

02

Achieves 1.09x to 1.33x speedup over baseline.

03

Operates without model training or calibration.

Abstract

Speculative decoding accelerates large language model inference by proposing tokens with a lightweight draft model and selectively accepting them using a target model. This work introduces DropMatch, a novel approach that matches draft tokens to the predictive distribution of the target model via Monte Carlo dropout applied exclusively to the LM head, enabling sampling-based acceptance decisions. By generating multiple decoding paths, our method forms an empirical token distribution against which draft tokens are evaluated for consistency. This acceptance mechanism enables the model to adaptively control the size of decoding paths under an appropriate dropout probability, preventing substantial distortion of the target model predictive distribution. The proposed method operates in a training-free, data-free, and calibration-free manner, requires no architectural modification to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification