Latent-Space Contrastive Reinforcement Learning for Stable and Efficient LLM Reasoning

Lianlei Shan; Han Chen; Yixuan Wang; Zhenjie Liu; Wei Li

arXiv:2601.17275·cs.LG·January 27, 2026

Latent-Space Contrastive Reinforcement Learning for Stable and Efficient LLM Reasoning

Lianlei Shan, Han Chen, Yixuan Wang, Zhenjie Liu, Wei Li

PDF

Open Access

TL;DR

This paper introduces DeepLatent Reasoning, a novel latent-space contrastive reinforcement learning framework that improves the stability, efficiency, and scalability of reasoning in large language models by shifting training to a continuous latent space.

Contribution

The paper proposes a latent-space RL approach with a lightweight assistant and contrastive learning to enhance reasoning in LLMs, addressing sample inefficiency and catastrophic forgetting.

Findings

01

DLR achieves more stable training convergence.

02

Supports longer-horizon reasoning chains.

03

Facilitates sustainable accumulation of reasoning capabilities.

Abstract

While Large Language Models (LLMs) demonstrate exceptional performance in surface-level text generation, their nature in handling complex multi-step reasoning tasks often remains one of ``statistical fitting'' rather than systematic logical deduction. Traditional Reinforcement Learning (RL) attempts to mitigate this by introducing a ``think-before-speak'' paradigm. However, applying RL directly in high-dimensional, discrete token spaces faces three inherent challenges: sample-inefficient rollouts, high gradient estimation variance, and the risk of catastrophic forgetting. To fundamentally address these structural bottlenecks, we propose \textbf{DeepLatent Reasoning (DLR)}, a latent-space bidirectional contrastive reinforcement learning framework. This framework shifts the trial-and-error cost from expensive token-level full sequence generation to the continuous latent manifold.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis