MRO: Enhancing Reasoning in Diffusion Language Models via Multi-Reward Optimization

Chenglong Wang; Yang Gan; Hang Zhou; Chi Hu; Yongyu Mu; Kai Song; Murun Yang; Bei Li; Chunliang Zhang; Tongran Liu; Jingbo Zhu; Zhengtao Yu; Tong Xiao

arXiv:2510.21473·cs.CL·October 27, 2025

MRO: Enhancing Reasoning in Diffusion Language Models via Multi-Reward Optimization

Chenglong Wang, Yang Gan, Hang Zhou, Chi Hu, Yongyu Mu, Kai Song, Murun Yang, Bei Li, Chunliang Zhang, Tongran Liu, Jingbo Zhu, Zhengtao Yu, Tong Xiao

PDF

Open Access

TL;DR

This paper introduces MRO, a multi-reward optimization method that enhances reasoning in diffusion language models by promoting token correlation, leading to better performance and faster sampling compared to traditional approaches.

Contribution

The paper proposes a novel MRO approach that explicitly optimizes token correlation during denoising, improving reasoning and sampling efficiency in diffusion language models.

Findings

01

MRO improves reasoning performance on benchmarks.

02

MRO achieves faster sampling speeds.

03

Enhancing token correlation benefits diffusion language models.

Abstract

Recent advances in diffusion language models (DLMs) have presented a promising alternative to traditional autoregressive large language models (LLMs). However, DLMs still lag behind LLMs in reasoning performance, especially as the number of denoising steps decreases. Our analysis reveals that this shortcoming arises primarily from the independent generation of masked tokens across denoising steps, which fails to capture the token correlation. In this paper, we define two types of token correlation: intra-sequence correlation and inter-sequence correlation, and demonstrate that enhancing these correlations improves reasoning performance. To this end, we propose a Multi-Reward Optimization (MRO) approach, which encourages DLMs to consider the token correlation during the denoising process. More specifically, our MRO approach leverages test-time scaling, reject sampling, and reinforcement…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Computational and Text Analysis Methods · Language and cultural evolution