C$^2$DLM: Causal Concept-Guided Diffusion Large Language Models

Kairong Han; Nuanqiao Shan; Ziyu Zhao; Zijing Hu; Xinpeng Dong; Junjian Ye; Lujia Pan; Fei Wu; Kun Kuang

arXiv:2511.22146·cs.CL·December 1, 2025

C$^2$DLM: Causal Concept-Guided Diffusion Large Language Models

Kairong Han, Nuanqiao Shan, Ziyu Zhao, Zijing Hu, Xinpeng Dong, Junjian Ye, Lujia Pan, Fei Wu, Kun Kuang

PDF

Open Access

TL;DR

C$^2$DLM introduces a causal concept-guided approach to diffusion language models, enhancing reasoning by explicitly modeling causal relationships, resulting in improved performance and training efficiency.

Contribution

The paper proposes a novel causal concept-guided diffusion language model that explicitly incorporates causal structures into attention mechanisms, improving reasoning capabilities.

Findings

01

Achieves 12% improvement in COT-OrderPerturb task with 3.2x faster training

02

Gains an average of 1.31% across six reasoning tasks

03

Effectively models causal relationships between concepts

Abstract

Autoregressive (AR) language models and Diffusion Language Models (DLMs) constitute the two principal paradigms of large language models. However, both paradigms suffer from insufficient reasoning capabilities. Human reasoning inherently relies on causal knowledge and thought, which are reflected in natural language. But in the AR paradigm, language is modeled as next token prediction (a strictly left-to-right, token-by-token order), whereas natural language itself exhibits more flexible causal structures. In the DLM paradigm, the attention mechanism is fully connected, which entirely disregards causal order. To fill this gap, we propose a \underline{\textbf{C}}ausal \underline{\textbf{C}}oncept-Guided \underline{\textbf{D}}iffusion \underline{\textbf{L}}anguage \underline{\textbf{M}}odel (C $^{2}$ DLM). Starting from DLM's fully connected attention, C $^{2}$ DLM first obtains a concept-level…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Machine Learning in Healthcare