NoiseAR: AutoRegressing Initial Noise Prior for Diffusion Models
Zeming Li, Xiangyue Liu, Xiangyu Zhang, Ping Tan, Heung-Yeung Shum

TL;DR
NoiseAR introduces a learnable, autoregressive prior for initial noise in diffusion models, enabling better control, improved sample quality, and integration with probabilistic frameworks.
Contribution
It proposes a novel autoregressive method to generate structured, controllable initial noise priors for diffusion models, enhancing flexibility and performance.
Findings
Improved sample quality with learned initial noise
Enhanced control via text prompts influencing the prior
Seamless integration into probabilistic frameworks
Abstract
Diffusion models have emerged as powerful generative frameworks, creating data samples by progressively denoising an initial random state. Traditionally, this initial state is sampled from a simple, fixed distribution like isotropic Gaussian, inherently lacking structure and a direct mechanism for external control. While recent efforts have explored ways to introduce controllability into the diffusion process, particularly at the initialization stage, they often rely on deterministic or heuristic approaches. These methods can be suboptimal, lack expressiveness, and are difficult to scale or integrate into more sophisticated optimization frameworks. In this paper, we introduce NoiseAR, a novel method for AutoRegressive Initial Noise Prior for Diffusion Models. Instead of a static, unstructured source, NoiseAR learns to generate a dynamic and controllable prior distribution for the…
Peer Reviews
Decision·Submitted to ICLR 2026
Given the paper's overclaimed contributions and lack of comparison to the state-of-the-art, the strengths are limited: 1. The specific architectural choices, such as the patch-based autoregressive model, are technically sound. 2. The method demonstrates clear empirical improvements over the limited set of baselines chosen for comparison.
The submission's primary weakness is a critical failure of scholarship. It completely omits a mature and active line of research on the exact problem this paper claims to be exploring.The paper's entire narrative is built on the claim that influencing the "foundational starting point" ($ z_T $) "remains relatively underexplored". This is factually incorrect. A significant body of work is dedicated to this exact problem, invalidating the submission's core claims to novelty.The authors fail to cit
- First to apply autoregressive modeling to initial noise prior learning, filling the gap in structured control of diffusion starting state, with text prompts directly guiding noise generation. - Validated across multiple datasets and models, with ablation studies clarifying parameter impacts and DPO proving iterability, ensuring reliable conclusions.
The author proposes a universal optimized solution for the initial noise distribution, and I have several questions regarding this: - How does the size of the training set used by the author to train this model compare to that of the base model? If it is larger, wouldn’t the training cost be too high, and why not simply fine-tune the base model instead? If it is smaller, for samples that the base model has encountered but this model has not, could this initial noise optimization approach degrade
- The idea of generating the initial noise is interesting and seems promising direction. - The proposed method is plug-and-play, able to improved diffusion models. - The experiments considers multiple aspects of the generated images including aesthetics (AES), human preference (HPSv2) and semantics (CLIP score). NoiseAR is shown to improve multiple metrics stably.
- Practical advantage over initial noise optimization (INO) is not well discussed. Is NoiseAR better than INO or different in applicable tasks? - Other than generating the initial noise, a simpler dictionary-based noise collection method [a] was proposed. How does the proposed method compare with it? [a]The Lottery Ticket Hypothesis in Denoising: Towards Semantic-Driven Initialization - Difference from the two-stage generation method: Having additional parameters in the NoiseAR module should b
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks
