OneLatent: Single-Token Compression for Visual Latent Reasoning

Bo Lv; Yasheng Sun; Junjie Wang; Haoxiang Shi

arXiv:2602.13738·cs.AI·February 17, 2026

OneLatent: Single-Token Compression for Visual Latent Reasoning

Bo Lv, Yasheng Sun, Junjie Wang, Haoxiang Shi

PDF

Open Access

TL;DR

OneLatent introduces a method to compress intermediate reasoning steps into a single latent token using image rendering and OCR supervision, significantly reducing output length and inference cost while maintaining high accuracy.

Contribution

It proposes a novel single-token latent reasoning framework that leverages rendered images and OCR supervision to efficiently condense reasoning processes.

Findings

01

Reduces output length by 11 times with minimal accuracy loss

02

Achieves up to 87.4 times compression on reasoning tasks

03

Maintains high accuracy (over 97%) with single latent tokens on logical reasoning benchmarks

Abstract

Chain-of-thought (CoT) prompting improves reasoning but often increases inference cost by one to two orders of magnitude. To address these challenges, we present \textbf{OneLatent}, a framework that compresses intermediate reasoning into a single latent token via supervision from rendered CoT images and DeepSeek-OCR hidden states. By rendering textual steps into images, we obtain a deterministic supervision signal that can be inspected and audited without requiring the model to output verbose textual rationales. Across benchmarks, OneLatent reduces average output length by $11 \times$ with only a $2.21%$ average accuracy drop relative to textual CoT, while improving output token contribution (OTC) by $6.8 \times$ . On long-chain logical reasoning, OneLatent reaches $99.80%$ on ProntoQA and $97.80%$ on ProsQA with one latent token, with compression up to $87.4 \times$ , supporting…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Topic Modeling