Latent Reasoning via Sentence Embedding Prediction

Hyeonbin Hwang; Byeongguk Jeon; Seungone Kim; Jiyeon Kim; Hoyeon Chang; Sohee Yang; Seungpil Won; Dohaeng Lee; Youbin Ahn; Minjoon Seo

arXiv:2505.22202·cs.CL·October 14, 2025

Latent Reasoning via Sentence Embedding Prediction

Hyeonbin Hwang, Byeongguk Jeon, Seungone Kim, Jiyeon Kim, Hoyeon Chang, Sohee Yang, Seungpil Won, Dohaeng Lee, Youbin Ahn, Minjoon Seo

PDF

Open Access 3 Reviews

TL;DR

This paper demonstrates that pretrained language models can be adapted to perform structured reasoning over sentence embeddings, achieving competitive results with reduced computational costs across various reasoning domains.

Contribution

The authors propose a novel framework for converting pretrained LMs into latent reasoning models operating in sentence embedding space, introducing two embedding paradigms and a new inference regime.

Findings

01

Contextual embeddings with continuous inference outperform traditional methods.

02

The approach reduces inference FLOPs by approximately 50%.

03

Latent reasoning models show promise in multiple reasoning domains.

Abstract

Autoregressive language models (LMs) generate one token at a time, yet human reasoning operates over higher-level abstractions - sentences, propositions, and concepts. This contrast raises a central question- Can LMs likewise learn to reason over structured semantic units rather than raw token sequences? In this work, we investigate whether pretrained LMs can be lifted into such abstract reasoning spaces by building on their learned representations. We present a framework that adapts a pretrained token-level LM to operate in sentence space by autoregressively predicting continuous embeddings of next sentences. We explore two embedding paradigms inspired by classical representation learning: 1) semantic embeddings, learned via autoencoding to preserve surface meaning; and 2) contextual embeddings, trained via next-sentence prediction to encode anticipatory structure. We evaluate both…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 4Confidence 4

Strengths

- The experiments span diverse reasoning domains (mathematical, logical, commonsense, and planning), demonstrating the general applicability of the framework. - The paper provides an interpretability tool (*SentenceLens*) for analyzing latent reasoning trajectories. - Despite the complex design, the authors’ exposition is overall clear and logically structured.

Weaknesses

- The InfoNCE loss ratio and λ parameter for CTX-C are not specified, making the experiments hard to reproduce. - The scope of fine-tuning (which components are updated) is unclear, and gradient flow is not described. - No comparisons are made against strong latent-reasoning baselines such as CoCoMix, CoDi, or Token Assorted. - Table 2 reports only a single inference mode; it should separately present accuracy and FLOPs across different reasoning modes. - The empirical improvements over coco

Reviewer 02Rating 4Confidence 4

Strengths

1. The paper introduces a novel idea of performing autoregressive reasoning in the sentence embedding space, elevating the reasoning hierarchy of language models from the token level to the sentence level. 2. The proposed SentenceLens enables decoding intermediate latent states into natural language sentences, making the model’s “thought process” interpretable and analyzable. 3. By reasoning directly in the continuous embedding space, computational efficiency is significantly improved (1.5–2.5×

Weaknesses

1. Although the paper proposes the concept of “sentence-level reasoning,” it does not sufficiently justify why sentence-level embeddings can necessarily capture the logical structures required for reasoning that token-level embeddings cannot. 2. The comparison is quite limited: it only benchmarks against one latent reasoning model (Coconut), and shows no significant advantage except on the Blocksworld task (where the performance gap is unusually large, raising concerns about possible evaluation

Reviewer 03Rating 6Confidence 3

Strengths

I appreciate the comprehensive evaluation across multiple dynamic graph benchmarks and the practical applicability to real-world scenarios like social networks and traffic prediction. The edge-aware attention mechanism is a nice touch that effectively captures local topology changes, and the meta-learning framework provides good theoretical grounding for adaptation.

Weaknesses

The computational overhead isn't thoroughly analyzed, which concerns me for large-scale deployment. I also think the paper oversells the novelty a bit since similar meta-learning approaches exist in the literature, and the comparison with some recent temporal GNN methods like EvolveGCN seems incomplete.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Machine Learning in Healthcare