Nearest Neighbor Speculative Decoding for LLM Generation and Attribution
Minghan Li, Xilun Chen, Ari Holtzman, Beidi Chen, Jimmy Lin, Wen-tau, Yih, Xi Victoria Lin

TL;DR
NEST is a semi-parametric language modeling method that improves generation quality, attribution, and inference speed by integrating real-world text spans through token-level retrieval and speculative decoding.
Contribution
The paper introduces NEST, a novel semi-parametric decoding approach that incorporates arbitrary-length text spans and enhances speed and attribution in LLM generation.
Findings
NEST outperforms kNN-LM in quality and attribution.
NEST achieves 1.8x faster inference on Llama-2-Chat 70B.
NEST performs competitively with in-context retrieval methods.
Abstract
Large language models (LLMs) often hallucinate and lack the ability to provide attribution for their generations. Semi-parametric LMs, such as kNN-LM, approach these limitations by refining the output of an LM for a given prompt using its nearest neighbor matches in a non-parametric data store. However, these models often exhibit slow inference speeds and produce non-fluent texts. In this paper, we introduce Nearest Neighbor Speculative Decoding (NEST), a novel semi-parametric language modeling approach that is capable of incorporating real-world text spans of arbitrary length into the LM generations and providing attribution to their sources. NEST performs token-level retrieval at each inference step to compute a semi-parametric mixture distribution and identify promising span continuations in a corpus. It then uses an approximate speculative decoding procedure that accepts a prefix of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Handwritten Text Recognition Techniques · Advanced Computational Techniques and Applications
MethodsBalanced Selection · NesT
