Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models

Zemin Huang; Zhiyang Chen; Zijun Wang; Tiancheng Li; Guo-Jun Qi

arXiv:2505.10446·cs.CL·November 3, 2025

Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models

Zemin Huang, Zhiyang Chen, Zijun Wang, Tiancheng Li, Guo-Jun Qi

PDF

Open Access 2 Models

TL;DR

This paper introduces DCoLT, a novel reasoning framework for diffusion language models that enables bidirectional, non-linear reasoning, significantly improving performance on math and code generation tasks.

Contribution

The paper presents DCoLT, a new diffusion-based reasoning method that optimizes entire reasoning trajectories using reinforcement learning, differing from traditional linear Chain-of-Thought approaches.

Findings

01

DCoLT-reinforced DLMs outperform other models on multiple tasks.

02

LLaDA with DCoLT improves reasoning accuracy by up to 19.5%.

03

The approach effectively leverages diffusion models for complex reasoning tasks.

Abstract

We introduce the Diffusion Chain of Lateral Thought (DCoLT), a reasoning framework for diffusion language models. DCoLT treats each intermediate step in the reverse diffusion process as a latent "thinking" action and optimizes the entire reasoning trajectory to maximize the reward on the correctness of the final answer with outcome-based Reinforcement Learning (RL). Unlike traditional Chain-of-Thought (CoT) methods that follow a causal, linear thinking process, DCoLT allows bidirectional, non-linear reasoning with no strict rule on grammatical correctness amid its intermediate steps of thought. We implement DCoLT on two representative Diffusion Language Models (DLMs). First, we choose SEDD as a representative continuous-time discrete diffusion model, where its concrete score derives a probabilistic policy to maximize the RL reward over the entire sequence of intermediate diffusion…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLanguage and cultural evolution · Topic Modeling

MethodsDiffusion · Shrink and Fine-Tune