Loading paper
Vectorizing the Trie: Efficient Constrained Decoding for LLM-based Generative Retrieval on Accelerators | Tomesphere