Paste, Inpaint and Harmonize via Denoising: Subject-Driven Image Editing with Pre-Trained Diffusion Model
Xin Zhang, Jiaxian Guo, Paul Yoo, Yutaka Matsuo, Yusuke Iwasawa

TL;DR
This paper introduces PhD, a novel image editing framework that combines exemplar images and text descriptions with a pre-trained diffusion model to achieve seamless, high-quality subject editing and scene generation.
Contribution
The paper proposes a new method that uses exemplar images and segmentation to improve subject-driven editing and scene generation with diffusion models, without fine-tuning.
Findings
Achieves state-of-the-art results in subject-driven image editing.
Demonstrates high-quality, seamless blending of inserted subjects.
Outperforms baseline methods in both quantitative and qualitative evaluations.
Abstract
Text-to-image generative models have attracted rising attention for flexible image editing via user-specified descriptions. However, text descriptions alone are not enough to elaborate the details of subjects, often compromising the subjects' identity or requiring additional per-subject fine-tuning. We introduce a new framework called \textit{Paste, Inpaint and Harmonize via Denoising} (PhD), which leverages an exemplar image in addition to text descriptions to specify user intentions. In the pasting step, an off-the-shelf segmentation model is employed to identify a user-specified subject within an exemplar image which is subsequently inserted into a background image to serve as an initialization capturing both scene context and subject identity in one. To guarantee the visual coherence of the generated or edited image, we introduce an inpainting and harmonizing module to guide the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Image Retrieval and Classification Techniques
MethodsDiffusion · Inpainting
