SAM Decoding: Speculative Decoding via Suffix Automaton

Yuxuan Hu; Ke Wang; Xiaokang Zhang; Fanjin Zhang; Cuiping Li; Hong; Chen; Jing Zhang

arXiv:2411.10666·cs.CL·December 17, 2024

SAM Decoding: Speculative Decoding via Suffix Automaton

Yuxuan Hu, Ke Wang, Xiaokang Zhang, Fanjin Zhang, Cuiping Li, Hong, Chen, Jing Zhang

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces SAM Decoding, a novel suffix automaton-based speculative decoding method that significantly accelerates large language model inference by efficiently finding exact suffix matches, outperforming existing retrieval-based SD techniques.

Contribution

The paper proposes SAM Decoding, a new suffix automaton approach for speculative decoding that improves speed and accuracy, and can be integrated with existing methods for broader domain applicability.

Findings

01

SAM Decoding is 18% faster than other retrieval-based SD methods.

02

Combining SAM Decoding with EAGLE-2 yields an additional 3.28%-11.13% speedup.

03

The method achieves efficient exact suffix matching with O(1) average time complexity.

Abstract

Speculative decoding (SD) has been demonstrated as an effective technique for lossless LLM inference acceleration. Retrieval-based SD methods, one kind of model-free method, have yielded promising speedup, but they often rely on incomplete retrieval resources, inefficient retrieval methods, and are constrained to certain domains. This paper presents a novel retrieval-based speculative decoding method that adapts suffix automaton (SAM) for efficient and accurate draft generation by utilizing common text corpus and dynamic text sequence. Unlike existing $n$ -gram matching methods, SAM-Decoding finds the exact longest suffix match, achieving an average time complexity of O(1) per generation step of SAM update and suffix retrieval. It can also integrate with existing methods, adaptively selecting a draft generation strategy based on match length to generalize to broader domains. Extensive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hyx1999/sam-decoding
pytorchOfficial

Videos

SAM Decoding: Speculative Decoding via Suffix Automaton· underline

Taxonomy

TopicsAlgorithms and Data Compression · Coding theory and cryptography · semigroups and automata theory

MethodsSegment Anything Model · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings