Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models
Zemin Huang, Zhiyang Chen, Zijun Wang, Tiancheng Li, Guo-Jun Qi

TL;DR
This paper introduces DCoLT, a novel reasoning framework for diffusion language models that enables bidirectional, non-linear reasoning, significantly improving performance on math and code generation tasks.
Contribution
The paper presents DCoLT, a new diffusion-based reasoning method that optimizes entire reasoning trajectories using reinforcement learning, differing from traditional linear Chain-of-Thought approaches.
Findings
DCoLT-reinforced DLMs outperform other models on multiple tasks.
LLaDA with DCoLT improves reasoning accuracy by up to 19.5%.
The approach effectively leverages diffusion models for complex reasoning tasks.
Abstract
We introduce the Diffusion Chain of Lateral Thought (DCoLT), a reasoning framework for diffusion language models. DCoLT treats each intermediate step in the reverse diffusion process as a latent "thinking" action and optimizes the entire reasoning trajectory to maximize the reward on the correctness of the final answer with outcome-based Reinforcement Learning (RL). Unlike traditional Chain-of-Thought (CoT) methods that follow a causal, linear thinking process, DCoLT allows bidirectional, non-linear reasoning with no strict rule on grammatical correctness amid its intermediate steps of thought. We implement DCoLT on two representative Diffusion Language Models (DLMs). First, we choose SEDD as a representative continuous-time discrete diffusion model, where its concrete score derives a probabilistic policy to maximize the RL reward over the entire sequence of intermediate diffusion…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLanguage and cultural evolution · Topic Modeling
MethodsDiffusion · Shrink and Fine-Tune
