Mamba Drafters for Speculative Decoding
Daewon Choi, Seunghyuk Oh, Saket Dingliwal, Jihoon Tack, Kyuyoung Kim, Woomin Song, Seojin Kim, Insu Han, Jinwoo Shin, Aram Galstyan, Shubham Katiyar, Sravan Babu Bodapati

TL;DR
This paper introduces Mamba Drafters, leveraging state space models for speculative decoding to achieve faster, memory-efficient language model generation with cross-model flexibility, outperforming existing methods.
Contribution
The paper presents Mamba-based drafters that combine the flexibility of external drafters with the efficiency of self-speculation, using SSMs to reduce complexity and improve performance.
Findings
Mamba drafters outperform existing external drafting methods.
They are comparable to state-of-the-art self-speculation approaches.
They require less memory and maintain cross-model adaptability.
Abstract
Speculative decoding has emerged as a promising approach to accelerating large language model (LLM) generation using a fast drafter while maintaining alignment with the target model's distribution. However, existing approaches face a trade-off: external drafters offer flexibility but can suffer from slower drafting, while self-speculation methods use drafters tailored to the target model but require re-training. In this paper, we introduce novel drafters based on Mamba, a state-of-the-art state space model (SSM), as a solution that combines the best aspects of both approaches. By leveraging the linear structure of SSMs, our approach avoids the quadratic complexity inherent in traditional Transformer-based methods, enabling faster drafting and lower memory usage while maintaining the flexibility to work across different target models. We further enhance efficiency with a novel test-time…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsBenford’s Law and Fraud Detection
MethodsMamba: Linear-Time Sequence Modeling with Selective State Spaces
