TL;DR
This paper investigates the impact of block size in multi-domain reinforcement learning for diffusion large language models, proposing a new benchmark, dataset, and a cross-domain training method.
Contribution
It introduces a domain conflict perspective on block size, constructs a new dataset and benchmark, and proposes a cross-domain post-training method for dLLMs.
Findings
The domain block size conflict significantly affects RL post-training effectiveness.
The Block-R1 benchmark enables evaluation across multiple domains and RL algorithms.
The proposed method improves cross-domain performance using sample-level block size optimization.
Abstract
Recently, reinforcement learning (RL) has been widely applied during post-training for diffusion large language models (dLLMs) to enhance reasoning with block-wise semi-autoregressive generation. Block size has therefore become a vital factor in dLLMs, since it determines the parallel decoding granularity and affects the rollout trajectories during RL optimisation, e.g., GRPO. Instead of investigating the effect of block size during inference on individual domains, this paper studies block size from a domain conflict perspective for dLLM RL post-training in multi-domain scenarios. The main contributions are: (1) a formulation of domain block size conflict in multi-domain RL for dLLMs, which will largely affect the post-training effectiveness for rollout-based RL methods; (2) a novel dataset, Block-R1-41K is constructed with a best-improved training block size for each sample, which also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
