Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space

Hengli Li; Chenxi Li; Tong Wu; Xuekai Zhu; Yuxuan Wang; Zhaoxin Yu; Eric Hanchen Jiang; Song-Chun Zhu; Zixia Jia; Ying Nian Wu; Zilong Zheng

arXiv:2505.13308·cs.LG·January 21, 2026

Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space

Hengli Li, Chenxi Li, Tong Wu, Xuekai Zhu, Yuxuan Wang, Zhaoxin Yu, Eric Hanchen Jiang, Song-Chun Zhu, Zixia Jia, Ying Nian Wu, Zilong Zheng

PDF

1 Repo

TL;DR

LatentSeek introduces a test-time, latent space reasoning method for LLMs that improves performance through iterative policy gradient updates guided by self-generated rewards, without retraining the model.

Contribution

This paper presents LatentSeek, a novel framework leveraging latent space policy gradient for test-time reasoning, outperforming existing methods and demonstrating efficiency and scalability.

Findings

01

LatentSeek outperforms Chain-of-Thought prompting and fine-tuning on reasoning benchmarks.

02

It converges within a few iterations for average complexity problems.

03

The method is highly efficient and scalable for reasoning tasks.

Abstract

Reasoning ability, a core component of human intelligence, continues to pose a significant challenge for Large Language Models (LLMs) in the pursuit of AGI. Although model performance has improved under the training scaling law, significant challenges remain, particularly with respect to training algorithms, such as catastrophic forgetting, and the limited availability of novel training data. As an alternative, test-time scaling enhances reasoning performance by increasing test-time computation without parameter updating. Unlike prior methods in this paradigm focused on token space, we propose leveraging latent space for more effective reasoning and better adherence to the test-time scaling law. We introduce LatentSeek, a novel framework that enhances LLM reasoning through Test-Time Instance-level Adaptation (TTIA) within the model's latent space. Specifically, LatentSeek leverages…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bigai-nlco/latentseek
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.