Focus on the Core: Empowering Diffusion Large Language Models by Self-Contrast
Jinyuan Feng, Xin Yu, Yiqun Chen, Xiaochi Wei, Yan Gao, Yi Wu, Yao Hu, Zhiqiang Pu

TL;DR
This paper introduces FoCore, a novel decoding strategy for Diffusion Large Language Models that leverages high-information-density tokens to improve output quality and decoding efficiency without additional training.
Contribution
The authors propose FoCore and FoCore extunderscore A, training-free methods that enhance generation quality and speed by focusing on high-information-density tokens during decoding.
Findings
FoCore improves performance on math, code, and reasoning benchmarks.
FoCore extunderscore A accelerates decoding by 2.07x and reduces latency by 58.4%.
Explicitly conditioning on HD tokens enhances output quality.
Abstract
The iterative denoising paradigm of Diffusion Large Language Models (DLMs) endows them with a distinct advantage in global context modeling. However, current decoding strategies fail to leverage this capability, typically exhibiting a local preference that overlooks the heterogeneous information density within the context, ultimately degrading generation quality. To address this limitation, we systematically investigate high-information-density (HD) tokens and present two key findings: (1) explicitly conditioning on HD tokens substantially improves output quality; and (2) HD tokens exhibit an early-decoding tendency, converging earlier than surrounding tokens. Motivated by these findings, we propose Focus on the Core \textbf{(FoCore)}, a training-free decoding strategy that utilizes HD tokens in a self-contrast manner, wherein HD tokens are temporarily remasked as negative samples, to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
