Token Bottleneck: One Token to Remember Dynamics

Taekyung Kim; Dongyoon Han; Byeongho Heo; Jeongeun Park; Sangdoo Yun

arXiv:2507.06543·cs.CV·March 9, 2026

Token Bottleneck: One Token to Remember Dynamics

Taekyung Kim, Dongyoon Han, Byeongho Heo, Jeongeun Park, Sangdoo Yun

PDF

Open Access 1 Video

TL;DR

Token Bottleneck (ToBo) is a self-supervised learning method that encodes dynamic scenes into a compact token to predict future scenes, capturing temporal dependencies for tasks like video propagation and robotic manipulation.

Contribution

Introduces ToBo, a novel self-supervised pipeline that learns compact, temporally aware scene representations using minimal scene hints, improving dynamic scene understanding.

Findings

01

Outperforms baselines in video label propagation and robot manipulation tasks

02

Demonstrates robustness and effectiveness in real-world robotic experiments

03

Scalable across different model sizes

Abstract

Deriving compact and temporally aware visual representations from dynamic scenes is essential for successful execution of sequential scene understanding tasks such as visual tracking and robotic manipulation. In this paper, we introduce Token Bottleneck (ToBo), a simple yet intuitive self-supervised learning pipeline that squeezes a scene into a bottleneck token and predicts the subsequent scene using minimal patches as hints. The ToBo pipeline facilitates the learning of sequential scene representations by conservatively encoding the reference scene into a compact bottleneck token during the squeeze step. In the reconstruction step, we guide the model to capture temporal dynamics by predicting the target scene using the bottleneck token along with few target patches as hints. This design encourages the vision backbone to embed temporal dependencies, thereby enabling understanding of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Token Bottleneck: One Token to Remember Dynamics· slideslive

Taxonomy

TopicsRobot Manipulation and Learning · Generative Adversarial Networks and Image Synthesis · Human Pose and Action Recognition