Most Likely Sequence Generation for $n$-Grams, Transformers, HMMs, and   Markov Chains, by Using Rollout Algorithms

Yuchao Li; Dimitri Bertsekas

arXiv:2403.15465·cs.LG·March 26, 2024·1 cites

Most Likely Sequence Generation for $n$-Grams, Transformers, HMMs, and Markov Chains, by Using Rollout Algorithms

Yuchao Li, Dimitri Bertsekas

PDF

Open Access

TL;DR

This paper introduces polynomial-time methods based on rollout algorithms to generate highly likely sequences from n-gram transformers, improving upon greedy heuristics for models like ChatGPT and HMMs.

Contribution

It proposes a novel rollout-based approach to efficiently compute highly likely sequences in n-gram transformer models, extending to HMMs and Markov chains.

Findings

01

Methods generate sequences with higher likelihood than greedy heuristics.

02

Computational experiments show modest increase in computation with improved sequence likelihood.

03

Applicable to a broad class of finite-state Markov models and inference tasks.

Abstract

In this paper we consider a transformer with an $n$ -gram structure, such as the one underlying ChatGPT. The transformer provides next word probabilities, which can be used to generate word sequences. We consider methods for computing word sequences that are highly likely, based on these probabilities. Computing the optimal (i.e., most likely) word sequence starting with a given initial state is an intractable problem, so we propose methods to compute highly likely sequences of $N$ words in time that is a low order polynomial in $N$ and in the vocabulary size of the $n$ -gram. These methods are based on the rollout approach from approximate dynamic programming, a form of single policy iteration, which can improve the performance of any given heuristic policy. In our case we use a greedy heuristic that generates as next word one that has the highest probability. We show with analysis,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression