Efficient and Stable Reinforcement Learning for Diffusion Language Models

Jiawei Liu; Xiting Wang; Yuanyuan Zhong; Defu Lian; Yu Yang

arXiv:2602.08905·cs.AI·February 10, 2026

Efficient and Stable Reinforcement Learning for Diffusion Language Models

Jiawei Liu, Xiting Wang, Yuanyuan Zhong, Defu Lian, Yu Yang

PDF

Open Access

TL;DR

This paper introduces Spatio-Temporal Pruning (STP), a novel framework that enhances the efficiency and stability of reinforcement learning applied to diffusion-based large language models by reducing redundancy in the generative process.

Contribution

The paper presents STP, a new pruning framework that improves RL for dLLMs by reducing variance and redundancy, leading to more stable and efficient training.

Findings

01

STP reduces variance in log-likelihood estimation.

02

STP outperforms state-of-the-art methods in efficiency.

03

STP achieves higher accuracy in experiments.

Abstract

Reinforcement Learning (RL) is crucial for unlocking the complex reasoning capabilities of Diffusion-based Large Language Models (dLLMs). However, applying RL to dLLMs faces unique challenges in efficiency and stability. To address these challenges, we propose Spatio-Temporal Pruning (STP), a framework designed to simultaneously improve the efficiency and stability of RL for dLLMs. STP compresses the redundancy in the generative process through: (1) \textit{spatial pruning}, which constrains the exploration space using static priors; and (2) \textit{temporal pruning}, which bypasses redundant late-stage refinement steps. Our theoretical analysis demonstrates that STP strictly reduces the variance of the log-likelihood estimation, thereby ensuring more stable policy updates. Extensive experiments demonstrate that STP surpasses state-of-the-art baselines in both efficiency and accuracy.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Reinforcement Learning in Robotics · Multimodal Machine Learning Applications