PostEdit: Posterior Sampling for Efficient Zero-Shot Image Editing
Feng Tian, Yixuan Li, Yichao Yan, Shanyan Guan, Yanhao Ge, Xiaokang, Yang

TL;DR
PostEdit is a novel image editing method that combines posterior sampling with diffusion models to achieve high efficiency, controllability, and background preservation without requiring inversion or training, outperforming existing methods.
Contribution
The paper introduces PostEdit, a new diffusion-based image editing approach that efficiently balances editing quality, background consistency, and speed without inversion or training.
Findings
Achieves state-of-the-art editing performance.
Preserves unedited regions accurately.
Operates in approximately 1.5 seconds with 18 GB GPU memory.
Abstract
In the field of image editing, three core challenges persist: controllability, background preservation, and efficiency. Inversion-based methods rely on time-consuming optimization to preserve the features of the initial images, which results in low efficiency due to the requirement for extensive network inference. Conversely, inversion-free methods lack theoretical support for background similarity, as they circumvent the issue of maintaining initial features to achieve efficiency. As a consequence, none of these methods can achieve both high efficiency and background consistency. To tackle the challenges and the aforementioned disadvantages, we introduce PostEdit, a method that incorporates a posterior scheme to govern the diffusion sampling process. Specifically, a corresponding measurement term related to both the initial features and Langevin dynamics is introduced to optimize the…
Peer Reviews
Decision·ICLR 2025 Poster
1. By extending the theory of posterior sampling to text-guided image editing tasks, the proposed method, PostEdit, eliminates the need for both inversion and training. 2. PostEdit is one of the fastest zero-shot image editing approaches, achieving execution times of under 2 seconds on an A100 GPU.
While the paper presents an interesting approach, some details and experimental results are not sufficiently comprehensive. Key aspects of the methodology are not fully elaborated, and additional experiments would be beneficial to further validate the claims. 1. The subfigure "Our Posterior Sampling Process" in Fig. 1 is difficult to understand. It is unclear what exactly it is meant to represent. Additionally, how does it highlight the advantages of the proposed algorithm? 2. The authors men
The proposed algorithm is both inversion-free and training-free, contributing to its fast performance. The manuscript is well-written and clear. In editing experiments, the algorithm outperforms state-of-the-art approaches in CLIP similarity.
In image restoration experiments, quantitative evaluation is lacking, as the authors provide only qualitative comparisons with competing methods. Furthermore, the authors do not provide evidence that the reconstruction results produced by their method are consistent with the input measurements. Specifically, it would be helpful to see whether applying the forward measurement operator to the algorithms' output yields results that closely approximate the original measurements.
1. The proposed method's efficiency in terms of GPU memory and inference time is noteworthy. Achieving high-quality results in approximately 1.5 seconds and with only 18 GB of GPU memory is an impressive step forward for zero-shot image editing. 2. The authors provide strong mathematical support for their approach, with the incorporation of Langevin dynamics and a posterior measurement term to optimize the estimated image. This theoretically addresses the issue of error accumulation seen in exis
1. Methodological Clarifications: a. Figure 2: Step 3 in Figure 2 is challenging to understand and requires more explanation regarding the optimization process. It would be beneficial to clearly explain how Langevin dynamics and the measurement term are incorporated in this step to improve the optimization. b. The explanation of Symbols such as $\mathcal{A}$ are missing: In Equation (16), the symbol $\mathcal{A}$ is not clearly defined or explained. Adding a clear definition for this symbol wo
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Medical Image Segmentation Techniques · Domain Adaptation and Few-Shot Learning
MethodsDiffusion
