Vectorizing the Trie: Efficient Constrained Decoding for LLM-based Generative Retrieval on Accelerators

Zhengyang Su; Isay Katsman; Yueqi Wang; Ruining He; Lukasz Heldt; Raghunandan Keshavan; Shao-Chuan Wang; Xinyang Yi; Mingyan Gao; Onkar Dalal; Lichan Hong; Ed Chi; Ningren Han

arXiv:2602.22647·cs.IR·February 27, 2026

Vectorizing the Trie: Efficient Constrained Decoding for LLM-based Generative Retrieval on Accelerators

Zhengyang Su, Isay Katsman, Yueqi Wang, Ruining He, Lukasz Heldt, Raghunandan Keshavan, Shao-Chuan Wang, Xinyang Yi, Mingyan Gao, Onkar Dalal, Lichan Hong, Ed Chi, Ningren Han

PDF

Open Access

TL;DR

This paper introduces STATIC, a novel vectorized constrained decoding method for LLM-based generative retrieval that significantly improves efficiency and scalability on hardware accelerators, enabling practical industrial deployment.

Contribution

STATIC transforms Trie-based constrained decoding into vectorized sparse matrix operations, achieving massive speedups and low latency overhead for large-scale industrial recommender systems.

Findings

01

948x speedup over CPU trie implementation

02

47-1033x speedup over binary-search baseline

03

Low latency overhead of 0.033 ms per step

Abstract

Generative retrieval has emerged as a powerful paradigm for LLM-based recommendation. However, industrial recommender systems often benefit from restricting the output space to a constrained subset of items based on business logic (e.g. enforcing content freshness or product category), which standard autoregressive decoding cannot natively support. Moreover, existing constrained decoding methods that make use of prefix trees (Tries) incur severe latency penalties on hardware accelerators (TPUs/GPUs). In this work, we introduce STATIC (Sparse Transition Matrix-Accelerated Trie Index for Constrained Decoding), an efficient and scalable constrained decoding technique designed specifically for high-throughput LLM-based generative retrieval on TPUs/GPUs. By flattening the prefix tree into a static Compressed Sparse Row (CSR) matrix, we transform irregular tree traversals into fully…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Recommender Systems and Techniques · Information Retrieval and Search Behavior