NoiseAR: AutoRegressing Initial Noise Prior for Diffusion Models

Zeming Li; Xiangyue Liu; Xiangyu Zhang; Ping Tan; Heung-Yeung Shum

arXiv:2506.01337·cs.LG·June 3, 2025

NoiseAR: AutoRegressing Initial Noise Prior for Diffusion Models

Zeming Li, Xiangyue Liu, Xiangyu Zhang, Ping Tan, Heung-Yeung Shum

PDF

Open Access 3 Reviews

TL;DR

NoiseAR introduces a learnable, autoregressive prior for initial noise in diffusion models, enabling better control, improved sample quality, and integration with probabilistic frameworks.

Contribution

It proposes a novel autoregressive method to generate structured, controllable initial noise priors for diffusion models, enhancing flexibility and performance.

Findings

01

Improved sample quality with learned initial noise

02

Enhanced control via text prompts influencing the prior

03

Seamless integration into probabilistic frameworks

Abstract

Diffusion models have emerged as powerful generative frameworks, creating data samples by progressively denoising an initial random state. Traditionally, this initial state is sampled from a simple, fixed distribution like isotropic Gaussian, inherently lacking structure and a direct mechanism for external control. While recent efforts have explored ways to introduce controllability into the diffusion process, particularly at the initialization stage, they often rely on deterministic or heuristic approaches. These methods can be suboptimal, lack expressiveness, and are difficult to scale or integrate into more sophisticated optimization frameworks. In this paper, we introduce NoiseAR, a novel method for AutoRegressive Initial Noise Prior for Diffusion Models. Instead of a static, unstructured source, NoiseAR learns to generate a dynamic and controllable prior distribution for the…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 2Confidence 5

Strengths

Given the paper's overclaimed contributions and lack of comparison to the state-of-the-art, the strengths are limited: 1. The specific architectural choices, such as the patch-based autoregressive model, are technically sound. 2. The method demonstrates clear empirical improvements over the limited set of baselines chosen for comparison.

Weaknesses

The submission's primary weakness is a critical failure of scholarship. It completely omits a mature and active line of research on the exact problem this paper claims to be exploring.The paper's entire narrative is built on the claim that influencing the "foundational starting point" ($ z_T $) "remains relatively underexplored". This is factually incorrect. A significant body of work is dedicated to this exact problem, invalidating the submission's core claims to novelty.The authors fail to cit

Reviewer 02Rating 6Confidence 5

Strengths

- First to apply autoregressive modeling to initial noise prior learning, filling the gap in structured control of diffusion starting state, with text prompts directly guiding noise generation. - Validated across multiple datasets and models, with ablation studies clarifying parameter impacts and DPO proving iterability, ensuring reliable conclusions.

Weaknesses

The author proposes a universal optimized solution for the initial noise distribution, and I have several questions regarding this: - How does the size of the training set used by the author to train this model compare to that of the base model? If it is larger, wouldn’t the training cost be too high, and why not simply fine-tune the base model instead? If it is smaller, for samples that the base model has encountered but this model has not, could this initial noise optimization approach degrade

Reviewer 03Rating 6Confidence 3

Strengths

- The idea of generating the initial noise is interesting and seems promising direction. - The proposed method is plug-and-play, able to improved diffusion models. - The experiments considers multiple aspects of the generated images including aesthetics (AES), human preference (HPSv2) and semantics (CLIP score). NoiseAR is shown to improve multiple metrics stably.

Weaknesses

- Practical advantage over initial noise optimization (INO) is not well discussed. Is NoiseAR better than INO or different in applicable tasks? - Other than generating the initial noise, a simpler dictionary-based noise collection method [a] was proposed. How does the proposed method compare with it? [a]The Lottery Ticket Hypothesis in Denoising: Towards Semantic-Driven Initialization - Difference from the two-stage generation method: Having additional parameters in the NoiseAR module should b

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel Reduction and Neural Networks