There and Back Again: On the relation between Noise and Image Inversions in Diffusion Models
{\L}ukasz Staniszewski, {\L}ukasz Kuci\'nski, Kamil Deja

TL;DR
This paper analyzes the relationship between noise and image inversions in diffusion models, revealing structural patterns in latents and proposing a simple fix to improve image editing and interpolation capabilities.
Contribution
It introduces a novel method replacing initial DDIM inversion steps with forward diffusion to enhance latent decorrelation and editing quality.
Findings
Latents show less diverse noise in smooth image areas.
First inversion steps are critical for accurate noise representation.
Replacing initial inversion steps improves editing and interpolation.
Abstract
Diffusion Models achieve state-of-the-art performance in generating new samples but lack a low-dimensional latent space that encodes the data into editable features. Inversion-based methods address this by reversing the denoising trajectory, transferring images to their approximated starting noise. In this work, we thoroughly analyze this procedure and focus on the relation between the initial noise, the generated samples, and their corresponding latent encodings obtained through the DDIM inversion. First, we show that latents exhibit structural patterns in the form of less diverse noise predicted for smooth image areas (e.g., plain sky). Through a series of analyses, we trace this issue to the first inversion steps, which fail to provide accurate and diverse noise. Consequently, the DDIM inversion space is notably less manipulative than the original noise. We show that prior inversion…
Peer Reviews
Decision·ICLR 2026 Poster
- The paper provides a thorough and insightful diagnosis of the underlying causes of latent distortions introduced during inversion. - The experiments are conducted across a diverse set of diffusion models, supporting the generality of the findings. - The proposed forward-diffusion replacement is conceptually simple, easy to integrate, and empirically improves both interpolation smoothness and the diversity of editing outcomes.
- The analysis focuses on diffusion-based models and does not provide evidence that similar problems occur in flow-matching models (e.g., FLUX, Stable Diffusion 3) - The paper evaluates mainly interpolation and text-guided editing, but does not explore other inversion-driven applications (e.g., local editing, style transfer).
1. The authors provide a nice observation regarding the statistics of inverted noise latents compared to the native noise space, and the repercussions regarding the usage of these latents for image manipulation and editing. 2. The authors provide a simple remedy to fix the inverted latent statistics, and as such their downstream editing potential, with minimal cost of reconstruction quality. 3. The authors provide thorough quantitative evaluation of the tasks presented (image interpolation and t
1. Adding random gaussian noise instead of inversion in the first steps. The authors claim that the last steps are not important for reconstruction quality, but the diffusion process acts as a coarse-to-fine spectral regressor throughout the whole process [1]. This means that the last steps should correspond to fine-grained details, such as textures. This can be observed, for example, in the results in figure 6 (top right) where the tower in the background is generated with small windows, which
The paper provides a structured analysis of the problem, which can be valuable to the community. While many of its conclusions are already well-established within the field (DDIM inversion is not a Gaussian noise etc.), the experiments presented in the methods section offer tangible evidence that reaffirms these claims. Furthermore, the paper proposes a simple solution that appears to effectively mitigate the identified issues. Overall, the presentation is clear and easy to follow.
My main concern with the paper lies in its contributions. The method is organized into three subsections, in which the authors address the following questions: - Sec 3.1: Are there any difference between the original noise and the DDIM inversion noise? (Answer: Yes) - Sec 3.2: How does it differ? (Answer: Loss of variance mostly in plain regions) - Sec 3.3: Why? (Answer: Because it happens in the first few steps) In my opinion, the first two sections primarily reiterate observations and facts
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neuroimaging Techniques and Applications
MethodsDiffusion
