Goose: Anisotropic Speculation Trees for Training-Free Speculative Decoding
Tao Jin, Phuong Minh Nguyen, Naoya Inoue

TL;DR
GOOSE introduces anisotropic speculation trees for training-free decoding, significantly improving inference speed by adaptively organizing candidate tokens based on their acceptance reliability.
Contribution
The paper proposes a novel anisotropic tree structure that optimally arranges candidate tokens, leading to substantial speedups in speculative decoding without additional training.
Findings
GOOSE achieves 1.9-4.3x speedup across five LLMs and benchmarks.
Anisotropic trees outperform balanced trees by 12-33% under the same verification budget.
Reliable tokens form deep chains, unreliable tokens spread as wide branches, optimizing acceptance rates.
Abstract
Speculative decoding accelerates large language model inference by drafting multiple candidate tokens and verifying them in a single forward pass. Candidates are organized as a tree: deeper trees accept more tokens per step, but adding depth requires sacrificing breadth (fallback options) under a fixed verification budget. Existing training-free methods draft from a single token source and shape their trees without distinguishing candidate quality across origins. We observe that two common training-free token sources - n-gram matches copied from the input context, and statistical predictions from prior forward passes - differ dramatically in acceptance rate (~6x median gap, range 2-18x across five models and five benchmarks). We prove that when such a quality gap exists, the optimal tree is anisotropic (asymmetric): reliable tokens should form a deep chain while unreliable tokens spread…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
