Discrete JEPA: Learning Discrete Token Representations without Reconstruction

Junyeob Baek; Hosung Lee; Christopher Hoang; Mengye Ren; Sungjin Ahn

arXiv:2506.14373·cs.CV·June 24, 2025

Discrete JEPA: Learning Discrete Token Representations without Reconstruction

Junyeob Baek, Hosung Lee, Christopher Hoang, Mengye Ren, Sungjin Ahn

PDF

TL;DR

Discrete-JEPA introduces a novel semantic tokenization framework that enhances symbolic reasoning and systematic inference in AI, outperforming existing methods on visual symbolic prediction tasks and fostering emergent structured patterns.

Contribution

It extends latent predictive coding with semantic tokenization and new objectives, enabling robust symbolic reasoning without reconstruction, a significant advancement over prior tokenization approaches.

Findings

01

Outperforms baselines on visual symbolic prediction tasks

02

Emergence of systematic patterns in learned semantic token space

03

Potential to advance symbolic world modeling and planning in AI

Abstract

The cornerstone of cognitive intelligence lies in extracting hidden patterns from observations and leveraging these principles to systematically predict future outcomes. However, current image tokenization methods demonstrate significant limitations in tasks requiring symbolic abstraction and logical reasoning capabilities essential for systematic inference. To address this challenge, we propose Discrete-JEPA, extending the latent predictive coding framework with semantic tokenization and novel complementary objectives to create robust tokenization for symbolic reasoning tasks. Discrete-JEPA dramatically outperforms baselines on visual symbolic prediction tasks, while striking visual evidence reveals the spontaneous emergence of deliberate systematic patterns within the learned semantic token space. Though an initial model, our approach promises a significant impact for advancing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.