Next-Token Prediction Should be Ambiguity-Sensitive: A Meta-Learning Perspective
Leo Gagnon, Eric Elmoznino, Sarthak Mittal, Tom Marty, Tejas Kasetty, Dhanya Sridhar, Guillaume Lajoie

TL;DR
This paper argues that next-token prediction should account for ambiguity, introducing a meta-learning benchmark and a method to improve model performance in high-ambiguity situations, inspired by cognitive science insights.
Contribution
It presents MetaHMM, a synthetic benchmark for ambiguity in sequence prediction, and proposes a method to enhance models' handling of ambiguous contexts based on cognitive theories.
Findings
Transformers struggle with high-ambiguity predictions.
The proposed method improves performance in ambiguous contexts.
Preliminary results show better capacity allocation and scalable inference.
Abstract
The rapid adaptation ability of auto-regressive foundation models is often attributed to the diversity of their pre-training data. This is because, from a Bayesian standpoint, minimizing prediction error in such settings requires integrating over all plausible latent hypotheses consistent with observations. While this behavior is desirable in principle, it often proves too ambitious in practice: under high ambiguity, the number of plausible latent alternatives makes Bayes-optimal prediction computationally intractable. Cognitive science has long recognized this limitation, suggesting that under such conditions, heuristics or information-seeking strategies are preferable to exhaustive inference. Translating this insight to next-token prediction, we hypothesize that low- and high-ambiguity predictions pose different computational demands, making ambiguity-agnostic next-token prediction a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Domain Adaptation and Few-Shot Learning · Machine Learning and Data Classification
