LaDi-RL: Latent Diffusion Reasoning Prevents Entropy Collapse in Reinforcement Learning

Haoqiang Kang; Yizhe Zhang; Nikki Lijing Kuang; Yi-An Ma; Lianhui Qin

arXiv:2602.01705·cs.LG·May 19, 2026

LaDi-RL: Latent Diffusion Reasoning Prevents Entropy Collapse in Reinforcement Learning

Haoqiang Kang, Yizhe Zhang, Nikki Lijing Kuang, Yi-An Ma, Lianhui Qin

PDF

TL;DR

LaDi-RL introduces a diffusion-based latent reasoning approach in reinforcement learning to improve global reasoning capabilities in language models, addressing the limitations of token-level optimization.

Contribution

It proposes a hierarchical latent-text rollout method that enhances reward estimation for latent diffusion policies in RL, leading to improved reasoning performance.

Findings

01

Outperforms token-level RL by 9.4% on code generation

02

Achieves 5.7% improvement on math reasoning

03

Surpasses base model pass@k performance

Abstract

Reinforcement learning has become a central paradigm for improving LLM reasoning, but most existing methods optimize policies over discrete token sequences. This creates a mismatch between the optimization space and the structure of reasoning: many important decisions are semantic, global, and trajectory-level rather than local token choices. Continuous latent-space RL offers a promising alternative by allowing policies to explore higher-level reasoning representations. However, simply moving to latent space is not sufficient. The resulting policy must model a complex, multi-modal distribution over valid reasoning trajectories. We therefore propose Latent Diffusion Reasoning with Reinforcement Learning (LaDi-RL), where a diffusion model generates latent reasoning trajectories through iterative denoising. This formulation enables structured exploration and expressive distribution…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Topic Modeling · Multimodal Machine Learning Applications