Identifiable Token Correspondence for World Models

Youngin Kim; Ray Sun; Inho Kim; Bumsoo Park; Hyun Oh Song

arXiv:2605.16457·cs.LG·May 22, 2026

Identifiable Token Correspondence for World Models

Youngin Kim, Ray Sun, Inho Kim, Bumsoo Park, Hyun Oh Song

PDF

1 Repo

TL;DR

This paper introduces Identifiable Token Correspondence (ITC), a method to improve token-based transformer world models in visual reinforcement learning by ensuring token persistence across frames, leading to state-of-the-art results.

Contribution

ITC formulates next-frame token prediction as a structured assignment problem, enhancing temporal consistency without altering existing transformer architectures.

Findings

01

Achieves state-of-the-art performance on 4 benchmarks.

02

Significantly improves scores on the Craftax-classic benchmark.

03

Enhances token persistence and consistency in long-horizon rollouts.

Abstract

Token-based transformer world models have shown strong performance in visual reinforcement learning, but often suffer from temporal inconsistency in long-horizon rollouts, including object duplication, disappearance, and transmutation. A key reason is that most existing approaches treat next-frame prediction purely as a token generation problem, without considering the persistence of tokens across time. We introduce Identifiable Token Correspondence (ITC), a decoding step for token-based transformer world models that formulates next-frame prediction as a structured assignment problem with latent token correspondence variables: each next-frame token is explained either by copying a token from the previous frame or by generating a new one. ITC leaves the transformer architecture and training procedure unchanged and can be added on top of existing backbones. Our experiments show…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

snu-mllab/Identifiable-Token-Correspondence
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.