CREST: Effectively Compacting a Datastore For Retrieval-Based   Speculative Decoding

Sophia Ho; Jinsol Park; Patrick Wang

arXiv:2408.04678·cs.CL·August 12, 2024

CREST: Effectively Compacting a Datastore For Retrieval-Based Speculative Decoding

Sophia Ho, Jinsol Park, Patrick Wang

PDF

Open Access

TL;DR

CREST is a redesigned datastore for speculative decoding that stores a selective subset of n-grams, reducing storage needs while maintaining or improving decoding performance on benchmark tasks.

Contribution

CREST introduces a method to store only the most common and smallest n-grams, making the datastore more compact and efficient without sacrificing accuracy.

Findings

01

CREST reduces storage space by 10.6-13.5x compared to REST.

02

CREST achieves 16.5-17.1% higher acceptance length than REST.

03

CREST maintains performance while significantly decreasing storage requirements.

Abstract

We present CREST (Compact Retrieval-Based Speculative Decoding), a redesign of REST that allows it to be effectively "compacted". REST is a drafting technique for speculative decoding based on retrieving exact n-gram matches of the most recent n tokens generated by the target LLM from a datastore. The key idea of CREST is to only store a subset of the smallest and most common n-grams in the datastore with the hope of achieving comparable performance with less storage space. We found that storing a subset of n-grams both reduces storage space and improves performance. CREST matches REST's accepted token length with 10.6-13.5x less storage space and achieves a 16.5-17.1% higher acceptance length than REST using the same storage space on the HumanEval and MT Bench benchmarks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications