DiffEdit: Diffusion-based semantic image editing with mask guidance
Guillaume Couairon, Jakob Verbeek, Holger Schwenk, Matthieu Cord

TL;DR
DiffEdit leverages diffusion models to perform semantic image editing guided by automatically generated masks, enabling precise edits based on text prompts while maintaining image content fidelity.
Contribution
It introduces an automatic mask generation method for diffusion-based image editing, enhancing flexibility and performance over mask-dependent approaches.
Findings
Achieves state-of-the-art performance on ImageNet
Effective in challenging settings with COCO and generated images
Demonstrates high-quality, content-preserving edits
Abstract
Image generation has recently seen tremendous advances, with diffusion models allowing to synthesize convincing images for a large variety of text prompts. In this article, we propose DiffEdit, a method to take advantage of text-conditioned diffusion models for the task of semantic image editing, where the goal is to edit an image based on a text query. Semantic image editing is an extension of image generation, with the additional constraint that the generated image should be as similar as possible to a given input image. Current editing methods based on diffusion models usually require to provide a mask, making the task much easier by treating it as a conditional inpainting task. In contrast, our main contribution is able to automatically generate a mask highlighting regions of the input image that need to be edited, by contrasting predictions of a diffusion model conditioned on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Image Retrieval and Classification Techniques
MethodsInpainting · Diffusion
