MineDraft: A Framework for Batch Parallel Speculative Decoding
Zhenwei Tang, Arun Verma, Zijian Zhou, Zhaoxuan Wu, Alok Prakash, Daniela Rus, Bryan Kian Hsiang Low

TL;DR
MineDraft introduces a batch parallel speculative decoding framework that overlaps drafting and verification stages, significantly improving inference speed and efficiency for large language models.
Contribution
The paper proposes MineDraft, a novel batch-parallel design for speculative decoding that enhances performance by overlapping stages, supported by theoretical analysis and practical implementation.
Findings
Up to 75% throughput improvement
Up to 39% latency reduction
Effective overlapping of drafting and verification stages
Abstract
Speculative decoding (SD) accelerates large language model inference by using a smaller draft model to propose draft tokens that are subsequently verified by a larger target model. However, the performance of standard SD is often limited by the strictly sequential execution of these drafting and verification stages. To address this, this paper proposes MineDraft, a batch parallel speculative decoding (PSD) framework designed to effectively hide drafting latency by overlapping it with verification. Our theoretical analysis shows that PSD is substantially more efficient than standard SD. MineDraft realizes the PSD through a novel batch-parallel design that maintains two batches of requests, overlapping drafting for one batch with verification for the other. Our experimental results show significant improvements of MineDraft in both throughput (up to 75%) and end-to-end latency (up to 39%)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Machine Learning and Algorithms · Generative Adversarial Networks and Image Synthesis
