ReDDiT: Rehashing Noise for Discrete Visual Generation
Tianren Ma, Xiaosong Zhang, Boyu Yang, Junlan Feng, Qixiang Ye

TL;DR
ReDDiT introduces a novel noise rehashing technique for discrete diffusion models, enhancing their expressive capacity and diversity, resulting in significantly improved image generation quality comparable to continuous models.
Contribution
The paper proposes a rehashing noise approach for discrete diffusion transformers, extending absorbing states and improving diversity and quality in discrete visual generation.
Findings
ReDDiT reduces gFID from 6.18 to 1.61.
Achieves generation quality on par with continuous models.
Enhances diversity and consistency in discrete diffusion processes.
Abstract
In the visual generative area, discrete diffusion models are gaining traction for their efficiency and compatibility. However, pioneered attempts still fall behind their continuous counterparts, which we attribute to noise (absorbing state) design and sampling heuristics. In this study, we propose a rehashing noise approach for discrete diffusion transformer (termed ReDDiT), with the aim to extend absorbing states and improve expressive capacity of discrete diffusion models. ReDDiT enriches the potential paths that latent variables traverse during training with randomized multi-index corruption. The derived rehash sampler, which reverses the randomized absorbing paths, guarantees high diversity and low discrepancy of the generation process. These reformulations lead to more consistent and competitive generation quality, mitigating the need for heavily tuned randomness. Experiments show…
Peer Reviews
Decision·ICLR 2026 Poster
1. The introduction of rehashing noise provides a novel way to enrich latent variable traversal, offering improved diversity and higher quality generation in discrete diffusion models. 2. ReDDiT outperforms the baseline models (MaskGIT and DDM) on critical metrics like gFID and IS, with competitive efficiency when compared to continuous models. 3. The model works effectively with large vocabulary codebooks (up to 16,384 entries), demonstrating its robustness even when scaled.
1. Although ReDDiT reports strong numbers on ImageNet‑1K, its effectiveness on more complex or diverse datasets is not discussed. Moreover, for a generation paper, the visualization evidence is quite limited, making it difficult to fully assess the qualitative improvements or appreciate the contribution beyond the single benchmark. 2. The paper does not discuss potential limitations or failure cases of rehashing noise, especially under large‑vocabulary tokenizers or more difficult semantic dist
1.Significant empirical improvement: ReDDiT delivers a substantial leap in generation quality for discrete models, closing much of the gap with continuous diffusion while preserving the efficiency advantages of discrete token-based generation. 2. Insightful problem diagnosis and elegant solution: The paper clearly articulates the shortcomings of single-mask noise and Gumbel-based sampling, and the proposed rehashing noise mechanism is both theoretically motivated and practically effective. 3. Cl
1.Minor performance gap with best continuous models: While ReDDiT achieves gFID = 1.61, the best continuous models (e.g., MDTv2) report gFID ≈ 1.58 under similar settings. Although the efficiency advantage is compelling, the paper could more explicitly acknowledge this small but notable gap as a limitation or future direction. 2. Tokenizer-dependent hyperparameter tuning: The optimal noise capacity m varies with the tokenizer (e.g., m=128 for LlamaGen-f8 vs. m=1024 for IBQ), requiring empirical
1. The paper proposed a new sampling methods that facilitate efficient and diverse generation for discrete diffusion, and it has a good structure with clear motivation, distinctive contributions, and solid theory. 2. The proposed method is well-supported by authors’ experiment results, either in tables other than figures.
1. The high diversity of the generation can be seen from qualitative examples, while what does low discrepancy mean in the proposed sampler? 2. The methodology part is somewhat confusing. For equation 6, the definition of \mathbf{m}_j is not clear. Does \mathbf{m}_j indicate a absorbing token at the j-th position of the vector? Plus, I=0 and j=0 does mot make sense for index starting from 1. For equation 8, after rewriting equation 1, why does the transition kernel become \frac{1}{m}? If it is
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Music Technology and Sound Studies · Human Motion and Animation
MethodsDiffusion
