TABED: Test-Time Adaptive Ensemble Drafting for Robust Speculative Decoding in LVLMs
Minjae Lee, Wonjun Kang, Byeongkeun Ahn, Christian Classen, Kevin Galim, Seunghyuk Oh, Minghao Yan, Hyung Il Koo, Kangwook Lee

TL;DR
TABED introduces a dynamic ensemble approach for speculative decoding in LVLMs, significantly improving inference speed and robustness without additional training, by leveraging deviations from past ground truths.
Contribution
It proposes TABED, a training-free, plug-and-play method that adaptively ensembles drafts during inference to enhance speed and robustness in LVLMs.
Findings
Achieves 1.74x speedup over autoregressive decoding.
Provides 5% accuracy improvement over single drafting methods.
Remains training-free with negligible ensembling costs.
Abstract
Speculative decoding (SD) has proven effective for accelerating LLM inference by quickly generating draft tokens and verifying them in parallel. However, SD remains largely unexplored for Large Vision-Language Models (LVLMs), which extend LLMs to process both image and text prompts. To address this gap, we benchmark existing inference methods with small draft models on 11 datasets across diverse input scenarios and observe scenario-specific performance fluctuations. Motivated by these findings, we propose Test-time Adaptive Batched Ensemble Drafting (TABED), which dynamically ensembles multiple drafts obtained via batch inference by leveraging deviations from past ground truths available in the SD setting. The dynamic ensemble method achieves an average robust walltime speedup of 1.74x over autoregressive decoding and a 5% improvement over single drafting methods, while remaining…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Adversarial Robustness in Machine Learning
