EraseDraw: Learning to Draw Step-by-Step via Erasing Objects from Images

Alper Canberk; Maksym Bondarenko; Ege Ozguroglu; Ruoshi Liu; and Carl; Vondrick

arXiv:2409.00522·cs.CV·December 25, 2024

EraseDraw: Learning to Draw Step-by-Step via Erasing Objects from Images

Alper Canberk, Maksym Bondarenko, Ege Ozguroglu, Ruoshi Liu, and Carl, Vondrick

PDF

Open Access

TL;DR

EraseDraw introduces a novel approach to image editing by learning object insertion through erasing objects from images, leveraging high-quality data generated from object removal to improve spatially and visually consistent object insertion in diverse images.

Contribution

The paper presents a scalable data generation pipeline for training object insertion models by reversing object removal, leading to a state-of-the-art diffusion model for in-the-wild image editing.

Findings

01

Achieves state-of-the-art results in object insertion tasks.

02

Performs well on diverse prompts and images across various domains.

03

Automates iterative insertion using CLIP-guided beam search.

Abstract

Creative processes such as painting often involve creating different components of an image one by one. Can we build a computational model to perform this task? Prior works often fail by making global changes to the image, inserting objects in unrealistic spatial locations, and generating inaccurate lighting details. We observe that while state-of-the-art models perform poorly on object insertion, they can remove objects and erase the background in natural images very well. Inverting the direction of object removal, we obtain high-quality data for learning to insert objects that are spatially, physically, and optically consistent with the surroundings. With this scalable automatic data generation pipeline, we can create a dataset for learning object insertion, which is used to train our proposed text conditioned diffusion model. Qualitative and quantitative experiments have shown that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Image Processing and 3D Reconstruction · Advanced Image and Video Retrieval Techniques

MethodsDiffusion · Contrastive Language-Image Pre-training