TL;DR
LatentSeek introduces a test-time, latent space reasoning method for LLMs that improves performance through iterative policy gradient updates guided by self-generated rewards, without retraining the model.
Contribution
This paper presents LatentSeek, a novel framework leveraging latent space policy gradient for test-time reasoning, outperforming existing methods and demonstrating efficiency and scalability.
Findings
LatentSeek outperforms Chain-of-Thought prompting and fine-tuning on reasoning benchmarks.
It converges within a few iterations for average complexity problems.
The method is highly efficient and scalable for reasoning tasks.
Abstract
Reasoning ability, a core component of human intelligence, continues to pose a significant challenge for Large Language Models (LLMs) in the pursuit of AGI. Although model performance has improved under the training scaling law, significant challenges remain, particularly with respect to training algorithms, such as catastrophic forgetting, and the limited availability of novel training data. As an alternative, test-time scaling enhances reasoning performance by increasing test-time computation without parameter updating. Unlike prior methods in this paradigm focused on token space, we propose leveraging latent space for more effective reasoning and better adherence to the test-time scaling law. We introduce LatentSeek, a novel framework that enhances LLM reasoning through Test-Time Instance-level Adaptation (TTIA) within the model's latent space. Specifically, LatentSeek leverages…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
