JEPA-Reasoner: Decoupling Latent Reasoning from Token Generation
Bingyang Kelvin Liu, Ziyu Patrick Chen, David P. Woodruff

TL;DR
JEPA-Reasoner introduces a decoupled architecture for language models that separates reasoning from token generation, significantly improving reasoning accuracy and robustness by isolating errors and maintaining multiple hypotheses.
Contribution
This paper presents JEPA-Reasoner, a novel architecture that decouples latent reasoning from token generation, enhancing robustness and reasoning capabilities in language models.
Findings
149.5% improvement in 8-shot GSM8K accuracy for a 0.9B model
Error containment prevents token errors from affecting reasoning
Enables representation of multiple hypotheses via mixed latent vectors
Abstract
Current autoregressive language models couple high-level reasoning and low-level token generation into a single sequential process, making the reasoning trajectory vulnerable to compounding expression errors. We propose JEPA-Reasoner, a novel architectural paradigm that decouples these tasks using a Joint-Embedding Predictive Architecture (JEPA) for pure latent-space reasoning and a separate Talker module for linguistic reconstruction. By isolating the reasoning engine from the discrete token-sampling process, our architecture enables: (1) Error Containment, where token-level failures cannot propagate into the latent reasoning chain; (2) Continuous Guidance, providing the generator with access to the entire lossless reasoning trajectory; and (3) Representation of Uncertainty, allowing the model to maintain multiple hypotheses via mixed latent vectors. Controlled experiments on synthetic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Topic Modeling · Multimodal Machine Learning Applications
