Inference-Time Rethinking with Latent Thought Vectors for Math Reasoning

Deqian Kong; Minglu Zhao; Aoyang Qin; Bo Pang; Chenxin Tao; David Hartmann; Edouardo Honig; Dehong Xu; Amit Kumar; Matt Sarte; Chuan Li; Jianwen Xie; and Ying Nian Wu

arXiv:2602.06584·cs.CL·February 9, 2026

Inference-Time Rethinking with Latent Thought Vectors for Math Reasoning

Deqian Kong, Minglu Zhao, Aoyang Qin, Bo Pang, Chenxin Tao, David Hartmann, Edouardo Honig, Dehong Xu, Amit Kumar, Matt Sarte, Chuan Li, Jianwen Xie, and Ying Nian Wu

PDF

Open Access

TL;DR

This paper introduces Inference-Time Rethinking, a framework that iteratively refines math reasoning by optimizing latent thought vectors during inference, leading to improved accuracy with smaller models.

Contribution

It proposes a novel generative approach that decouples reasoning into latent vectors and verbalization, enabling iterative self-correction at inference time.

Findings

01

A 0.2B-parameter model surpasses larger baselines on GSM8K.

02

Iterative rethinking improves reasoning accuracy significantly.

03

Effective math reasoning emerges from inference-time optimization rather than large model size.

Abstract

Standard chain-of-thought reasoning generates a solution in a single forward pass, committing irrevocably to each token and lacking a mechanism to recover from early errors. We introduce Inference-Time Rethinking, a generative framework that enables iterative self-correction by decoupling declarative latent thought vectors from procedural generation. We factorize reasoning into a continuous latent thought vector (what to reason about) and a decoder that verbalizes the trace conditioned on this vector (how to reason). Beyond serving as a declarative buffer, latent thought vectors compress the reasoning structure into a continuous representation that abstracts away surface-level token variability, making gradient-based optimization over reasoning strategies well-posed. Our prior model maps unstructured noise to a learned manifold of valid reasoning patterns, and at test time we employ a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel Reduction and Neural Networks · Generative Adversarial Networks and Image Synthesis · Machine Learning in Materials Science