LaDi-RL: Latent Diffusion Reasoning Prevents Entropy Collapse in Reinforcement Learning
Haoqiang Kang, Yizhe Zhang, Nikki Lijing Kuang, Yi-An Ma, Lianhui Qin

TL;DR
LaDi-RL introduces a diffusion-based latent reasoning approach in reinforcement learning to improve global reasoning capabilities in language models, addressing the limitations of token-level optimization.
Contribution
It proposes a hierarchical latent-text rollout method that enhances reward estimation for latent diffusion policies in RL, leading to improved reasoning performance.
Findings
Outperforms token-level RL by 9.4% on code generation
Achieves 5.7% improvement on math reasoning
Surpasses base model pass@k performance
Abstract
Reinforcement learning has become a central paradigm for improving LLM reasoning, but most existing methods optimize policies over discrete token sequences. This creates a mismatch between the optimization space and the structure of reasoning: many important decisions are semantic, global, and trajectory-level rather than local token choices. Continuous latent-space RL offers a promising alternative by allowing policies to explore higher-level reasoning representations. However, simply moving to latent space is not sufficient. The resulting policy must model a complex, multi-modal distribution over valid reasoning trajectories. We therefore propose Latent Diffusion Reasoning with Reinforcement Learning (LaDi-RL), where a diffusion model generates latent reasoning trajectories through iterative denoising. This formulation enables structured exploration and expressive distribution…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Topic Modeling · Multimodal Machine Learning Applications
