Next-Token Prediction and Regret Minimization
Mehryar Mohri, Clayton Sanford, Jon Schneider, Kiran Vodrahalli, Yifan Wu

TL;DR
This paper analyzes how next-token prediction models can be used in adversarial online decision environments, showing that unbounded context models can approximate low-regret distributions, unlike bounded context models.
Contribution
It demonstrates that unbounded context models can approximate low-regret distributions with negligible accuracy loss, and that transformer architectures can implement and learn these distributions.
Findings
Unbounded context models can be exponentially close to low-regret distributions.
Bounded context models may be far from any low-regret distribution.
Transformers can efficiently implement and learn low-regret distributions.
Abstract
We consider the question of how to employ next-token prediction algorithms in adversarial online decision-making environments. Specifically, if we train a next-token prediction model on a distribution over sequences of opponent actions, when is it the case that the induced online decision-making algorithm (by approximately best responding to the model's predictions) has low adversarial regret (i.e., when is a \emph{low-regret distribution})? For unbounded context windows (where the prediction made by the model can depend on all the actions taken by the adversary thus far), we show that although not every distribution is a low-regret distribution, every distribution is exponentially close (in TV distance) to one low-regret distribution, and hence sublinear regret can always be achieved at negligible cost to the accuracy of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
