The Bitter Lesson of Diffusion Language Models for Agentic Workflows: A Comprehensive Reality Check

Qingyu Lu; Liang Ding; Kanjian Zhang; Jinxia Zhang; Dacheng Tao

arXiv:2601.12979·cs.CL·April 27, 2026

The Bitter Lesson of Diffusion Language Models for Agentic Workflows: A Comprehensive Reality Check

Qingyu Lu, Liang Ding, Kanjian Zhang, Jinxia Zhang, Dacheng Tao

PDF

TL;DR

This paper critically evaluates diffusion-based large language models for agentic workflows, revealing significant limitations in their ability to perform reliable, long-horizon planning and precise tool use despite efficiency claims.

Contribution

The study provides a comprehensive assessment of dLLMs in agentic tasks, introduces DiffuAgent for multi-agent evaluation, and highlights the need for causal and logical reasoning integration.

Findings

01

dLLMs fail in long-horizon planning under temporal feedback

02

dLLMs struggle to maintain symbolic precision in tool-calling

03

dLLMs are effective in non-causal roles like memory summarization

Abstract

The pursuit of real-time agentic interaction has driven interest in Diffusion-based Large Language Models (dLLMs) as alternatives to auto-regressive backbones, promising to break the sequential latency bottleneck. However, does such efficiency gains translate into effective agentic behavior? In this work, we present a comprehensive evaluation of dLLMs (e.g., LLaDA, Dream) across two distinct agentic paradigms: Embodied Agents (requiring long-horizon planning) and Tool-Calling Agents (requiring precise formatting). Contrary to the efficiency hype, our results on Agentboard and BFCL reveal a "bitter lesson": current dLLMs fail to serve as reliable agentic backbones, frequently leading to systematically failure. (1) In Embodied settings, dLLMs suffer repeated attempts, failing to branch under temporal feedback. (2) In Tool-Calling settings, dLLMs fail to maintain symbolic precision (e.g.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.