Focus on the Core: Empowering Diffusion Large Language Models by Self-Contrast

Jinyuan Feng; Xin Yu; Yiqun Chen; Xiaochi Wei; Yan Gao; Yi Wu; Yao Hu; Zhiqiang Pu

arXiv:2605.01373·cs.CL·May 5, 2026

Focus on the Core: Empowering Diffusion Large Language Models by Self-Contrast

Jinyuan Feng, Xin Yu, Yiqun Chen, Xiaochi Wei, Yan Gao, Yi Wu, Yao Hu, Zhiqiang Pu

PDF

TL;DR

This paper introduces FoCore, a novel decoding strategy for Diffusion Large Language Models that leverages high-information-density tokens to improve output quality and decoding efficiency without additional training.

Contribution

The authors propose FoCore and FoCore extunderscore A, training-free methods that enhance generation quality and speed by focusing on high-information-density tokens during decoding.

Findings

01

FoCore improves performance on math, code, and reasoning benchmarks.

02

FoCore extunderscore A accelerates decoding by 2.07x and reduces latency by 58.4%.

03

Explicitly conditioning on HD tokens enhances output quality.

Abstract

The iterative denoising paradigm of Diffusion Large Language Models (DLMs) endows them with a distinct advantage in global context modeling. However, current decoding strategies fail to leverage this capability, typically exhibiting a local preference that overlooks the heterogeneous information density within the context, ultimately degrading generation quality. To address this limitation, we systematically investigate high-information-density (HD) tokens and present two key findings: (1) explicitly conditioning on HD tokens substantially improves output quality; and (2) HD tokens exhibit an early-decoding tendency, converging earlier than surrounding tokens. Motivated by these findings, we propose Focus on the Core \textbf{(FoCore)}, a training-free decoding strategy that utilizes HD tokens in a self-contrast manner, wherein HD tokens are temporarily remasked as negative samples, to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.