Not all tokens contribute equally to diffusion learning

Guoqing Zhang; Lu Shi; Wanru Xu; Linna Zhang; Sen Wang; Fangfang Wang; Yigang Cen

arXiv:2604.07026·cs.CV·April 9, 2026

Not all tokens contribute equally to diffusion learning

Guoqing Zhang, Lu Shi, Wanru Xu, Linna Zhang, Sen Wang, Fangfang Wang, Yigang Cen

PDF

TL;DR

This paper introduces DARE, a framework that enhances semantic guidance in diffusion models by addressing token distribution bias and spatial misalignment, leading to better text-to-video generation.

Contribution

It proposes distribution-aware rectification and spatial ensemble techniques to improve semantic token contribution in diffusion-based generative models.

Findings

01

DARE improves generation fidelity and semantic alignment.

02

DR-CFG balances token distribution during training.

03

SRA enhances spatial guidance by reweighting attention maps.

Abstract

With the rapid development of conditional diffusion models, significant progress has been made in text-to-video generation. However, we observe that these models often neglect semantically important tokens during inference, leading to biased or incomplete generations under classifier-free guidance. We attribute this issue to two key factors: distributional bias caused by the long-tailed token frequency in training data, and spatial misalignment in cross-attention where semantically important tokens are overshadowed by less informative ones. To address these issues, we propose Distribution-Aware Rectification and Spatial Ensemble (DARE), a unified framework that improves semantic guidance in diffusion models from the perspectives of distributional debiasing and spatial consistency. First, we introduce Distribution-Rectified Classifier-Free Guidance (DR-CFG), which regularizes the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.