DiRL: An Efficient Post-Training Framework for Diffusion Language Models

Ying Zhu; Jiaxin Wan; Xiaoran Liu; Siyang He; Qiqi Wang; Xu Guo; Tianyi Liang; Zengfeng Huang; Ziwei He; Xipeng Qiu

arXiv:2512.22234·cs.LG·January 7, 2026

DiRL: An Efficient Post-Training Framework for Diffusion Language Models

Ying Zhu, Jiaxin Wan, Xiaoran Liu, Siyang He, Qiqi Wang, Xu Guo, Tianyi Liang, Zengfeng Huang, Ziwei He, Xipeng Qiu

PDF

Open Access 1 Models

TL;DR

DiRL introduces an efficient post-training framework for diffusion language models, improving performance on complex reasoning tasks like mathematics through optimized training and inference strategies.

Contribution

The paper presents DiRL, a novel post-training framework combining blockwise training and optimized inference, along with DiPO, an unbiased policy optimization method for dLLMs.

Findings

01

Achieves state-of-the-art math performance among dLLMs.

02

Surpasses Qwen2.5 series on multiple benchmarks.

03

Enables efficient online model updates.

Abstract

Diffusion Language Models (dLLMs) have emerged as promising alternatives to Auto-Regressive (AR) models. While recent efforts have validated their pre-training potential and accelerated inference speeds, the post-training landscape for dLLMs remains underdeveloped. Existing methods suffer from computational inefficiency and objective mismatches between training and inference, severely limiting performance on complex reasoning tasks such as mathematics. To address this, we introduce DiRL, an efficient post-training framework that tightly integrates FlexAttention-accelerated blockwise training with LMDeploy-optimized inference. This architecture enables a streamlined online model update loop, facilitating efficient two-stage post-training (Supervised Fine-Tuning followed by Reinforcement Learning). Building on this framework, we propose DiPO, the first unbiased Group Relative Policy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
OpenMOSS-Team/DiRL-8B-Instruct
model· 8 dl· ♡ 13
8 dl♡ 13

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning