Most Likely Sequence Generation for $n$-Grams, Transformers, HMMs, and Markov Chains, by Using Rollout Algorithms
Yuchao Li, Dimitri Bertsekas

TL;DR
This paper introduces polynomial-time methods based on rollout algorithms to generate highly likely sequences from n-gram transformers, improving upon greedy heuristics for models like ChatGPT and HMMs.
Contribution
It proposes a novel rollout-based approach to efficiently compute highly likely sequences in n-gram transformer models, extending to HMMs and Markov chains.
Findings
Methods generate sequences with higher likelihood than greedy heuristics.
Computational experiments show modest increase in computation with improved sequence likelihood.
Applicable to a broad class of finite-state Markov models and inference tasks.
Abstract
In this paper we consider a transformer with an -gram structure, such as the one underlying ChatGPT. The transformer provides next word probabilities, which can be used to generate word sequences. We consider methods for computing word sequences that are highly likely, based on these probabilities. Computing the optimal (i.e., most likely) word sequence starting with a given initial state is an intractable problem, so we propose methods to compute highly likely sequences of words in time that is a low order polynomial in and in the vocabulary size of the -gram. These methods are based on the rollout approach from approximate dynamic programming, a form of single policy iteration, which can improve the performance of any given heuristic policy. In our case we use a greedy heuristic that generates as next word one that has the highest probability. We show with analysis,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression
