TL;DR
LaTER is a two-stage reasoning paradigm that reduces token usage in large language models by combining latent exploration with explicit verification, improving efficiency and accuracy.
Contribution
It introduces a training-free latent reasoning approach and a supervised corpus, enhancing reasoning efficiency and accuracy over standard chain-of-thought methods.
Findings
Reduces token usage by 16%-32% on benchmarks.
Improves AIME 2025 accuracy from 70.0% to 73.3%.
Fine-tuning with LaTER achieves 80.0% accuracy on AIME 2025.
Abstract
Chain-of-thought (CoT) reasoning improves large language models (LLMs) on difficult tasks, but it also makes inference expensive because every intermediate step must be generated as a discrete token. Latent reasoning reduces visible token generation by propagating continuous states, yet replacing explicit derivations with latent computation can hurt tasks that require symbolic checking. We propose Latent-Then-Explicit Reasoning (LaTER), a two-stage paradigm that first performs bounded exploration in a continuous latent space and then switches to explicit CoT for verification and answer generation. In a training-free instantiation, LaTER projects final-layer hidden states back to the input embedding space, preserves the latent KV cache, and uses entropy and model-native stop-token probes to decide when to switch. We find that strong reasoning models already exhibit structured latent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
