Latent-Space Contrastive Reinforcement Learning for Stable and Efficient LLM Reasoning
Lianlei Shan, Han Chen, Yixuan Wang, Zhenjie Liu, Wei Li

TL;DR
This paper introduces DeepLatent Reasoning, a novel latent-space contrastive reinforcement learning framework that improves the stability, efficiency, and scalability of reasoning in large language models by shifting training to a continuous latent space.
Contribution
The paper proposes a latent-space RL approach with a lightweight assistant and contrastive learning to enhance reasoning in LLMs, addressing sample inefficiency and catastrophic forgetting.
Findings
DLR achieves more stable training convergence.
Supports longer-horizon reasoning chains.
Facilitates sustainable accumulation of reasoning capabilities.
Abstract
While Large Language Models (LLMs) demonstrate exceptional performance in surface-level text generation, their nature in handling complex multi-step reasoning tasks often remains one of ``statistical fitting'' rather than systematic logical deduction. Traditional Reinforcement Learning (RL) attempts to mitigate this by introducing a ``think-before-speak'' paradigm. However, applying RL directly in high-dimensional, discrete token spaces faces three inherent challenges: sample-inefficient rollouts, high gradient estimation variance, and the risk of catastrophic forgetting. To fundamentally address these structural bottlenecks, we propose \textbf{DeepLatent Reasoning (DLR)}, a latent-space bidirectional contrastive reinforcement learning framework. This framework shifts the trial-and-error cost from expensive token-level full sequence generation to the continuous latent manifold.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis
