Next-Latent Prediction Transformers Learn Compact World Models

Jayden Teoh; Manan Tomar; Kwangjun Ahn; Edward S. Hu; Pratyusha Sharma; Riashat Islam; Alex Lamb; John Langford

arXiv:2511.05963·cs.LG·November 11, 2025

Next-Latent Prediction Transformers Learn Compact World Models

Jayden Teoh, Manan Tomar, Kwangjun Ahn, Edward S. Hu, Pratyusha Sharma, Riashat Islam, Alex Lamb, John Langford

PDF

Open Access 1 Datasets 1 Video

TL;DR

Next-Latent Prediction Transformers introduce a self-supervised latent space prediction task that encourages learning compact, belief-state-like representations, improving generalization and performance in sequence modeling tasks.

Contribution

The paper proposes NextLat, a novel auxiliary training objective for transformers that induces belief-state-like latent representations, enhancing their ability to model and generalize in sequential tasks.

Findings

01

Significant improvements in downstream accuracy across benchmarks.

02

Enhanced representation compression and interpretability.

03

Better performance in reasoning, planning, and language modeling tasks.

Abstract

Transformers replace recurrence with a memory that grows with sequence length and self-attention that enables ad-hoc look ups over past tokens. Consequently, they lack an inherent incentive to compress history into compact latent states with consistent transition rules. This often leads to learning solutions that generalize poorly. We introduce Next-Latent Prediction (NextLat), which extends standard next-token training with self-supervised predictions in the latent space. Specifically, NextLat trains a transformer to learn latent representations that are predictive of its next latent state given the next output token. Theoretically, we show that these latents provably converge to belief states, compressed information of the history necessary to predict the future. This simple auxiliary objective also injects a recurrent inductive bias into transformers, while leaving their…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

JaydenTeoh/manhattan
dataset· 499 dl
499 dl

Videos

Next-Latent Prediction Transformers Learn Compact World Models· underline

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Machine Learning in Healthcare