Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model

Lirui Zhao; Tianshuo Yang; Wenqi Shao; Yuxin Zhang; Yu Qiao; Ping Luo,; Kaipeng Zhang; Rongrong Ji

arXiv:2407.16982·cs.CV·July 25, 2024

Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model

Lirui Zhao, Tianshuo Yang, Wenqi Shao, Yuxin Zhang, Yu Qiao, Ping Luo,, Kaipeng Zhang, Rongrong Ji

PDF

1 Repo 1 Models 1 Datasets 3 Reviews

TL;DR

Diffree is a novel diffusion-based model that enables seamless, text-guided addition of objects into images without requiring bounding boxes or masks, maintaining visual consistency and relevance.

Contribution

We introduce Diffree, a text-guided object addition method trained on a new synthetic dataset, which predicts object placement and integrates objects seamlessly using only text prompts.

Findings

01

High success rate in object addition

02

Maintains background and spatial consistency

03

Outperforms existing methods in relevance and quality

Abstract

This paper addresses an important problem of object addition for images with only text guidance. It is challenging because the new object must be integrated seamlessly into the image with consistent visual context, such as lighting, texture, and spatial location. While existing text-guided image inpainting methods can add objects, they either fail to preserve the background consistency or involve cumbersome human intervention in specifying bounding boxes or user-scribbled masks. To tackle this challenge, we introduce Diffree, a Text-to-Image (T2I) model that facilitates text-guided object addition with only text control. To this end, we curate OABench, an exquisite synthetic dataset by removing objects with advanced image inpainting techniques. OABench comprises 74K real-world tuples of an original image, an inpainted image with the object removed, an object mask, and object…

Peer Reviews

Decision·Submitted to ICLR 2025

Reviewer 01Rating 5Confidence 5

Strengths

- Object inpainting using only text without relying on shape constraints. - The authors built an OABench to facilitate text-guided object inpainting. - The results are attractive and the supported applications are interesting.

Weaknesses

- Lack of ablation of the validation model design. For example, what happens to the output if the OMP is removed? BTW, integrating a mask head in the diffusion process is not new, e.g. in [1]. - The comparison is slightly unfair. In Figure-A10, comparing the model you trained on the curated dataset to other methods that were not retrained or fine-tuned is unfair. - This paper is poorly written, especially the explanation of the charts. For example, in line 50 of the description section, the auth

Reviewer 02Rating 6Confidence 5

Strengths

1. Diffree offers a user-friendly approach to inserting objects into images. The mask-free object insertion is particularly useful in practical applications. 2. The creation of OABench, a large-scale synthetic dataset, is a significant contribution, providing a rich resource for training and evaluating object addition models. 3. The OMP module's ability to predict the target mask and generate inpainting results simultaneously is a novel architectural advancement in this field.

Weaknesses

1. [1] proposed a method for mask prediction closely related to Diffree. An in-depth analysis and comparison with this work would be beneficial. 2. All the prompts used in this paper are in the form "add {object}". It is unclear how Diffree generalizes to more precise control, such as "add a dragon in the room". 3. While user-friendly for object insertion, it restricts users from adjusting the mask. In standard image processing, users or designers often need to make adjustments to achieve their

Reviewer 03Rating 3Confidence 3

Strengths

- **Originality**: Diffree’s approach of shape-free object addition guided solely by text is novel, significantly enhancing usability by eliminating the need for manual mask definitions. This innovation in user experience represents a unique contribution. - **Clarity**: Overall, the dataset creation process, and evaluation metrics are well-described, with figures that aid understanding of Diffree’s operational and comparative performance.

Weaknesses

- **Minor Typographical Issue**: There is a missing period at the end of line 101, which should be corrected for clarity. - **Dataset Limitation in Prompt Detail**: Since the dataset primarily relies on the COCO dataset, prompts are often generic object labels rather than detailed, fine-grained descriptions. This limitation can hinder the model's ability to respond to nuanced or interactive prompts, such as requests for specific object attributes or context-based interactions. - **Methodology Cl

Code & Models

Repositories

OpenGVLab/Diffree
pytorchOfficial

Models

🤗
LiruiZhao/Diffree
model· 10 dl· ♡ 18
10 dl♡ 18

Datasets

LiruiZhao/OABench
dataset· 19 dl
19 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsDiffusion · Inpainting