Add-SD: Rational Generation without Manual Reference

Lingfeng Yang; Xinyu Zhang; Xiang Li; Jinwen Chen; Kun Yao; Gang; Zhang; Errui Ding; Lingqiao Liu; Jingdong Wang; Jian Yang

arXiv:2407.21016·cs.CV·July 31, 2024

Add-SD: Rational Generation without Manual Reference

Lingfeng Yang, Xinyu Zhang, Xiang Li, Jinwen Chen, Kun Yao, Gang, Zhang, Errui Ding, Lingqiao Liu, Jingdong Wang, Jian Yang

PDF

Open Access 1 Repo

TL;DR

Add-SD is a text-conditioned diffusion model that automatically inserts objects into images with rational sizes and positions, improving data diversity and performance on downstream tasks, especially for rare classes.

Contribution

It introduces a dataset and fine-tunes a diffusion model for rational object insertion based solely on text prompts, without manual references.

Findings

01

Improves rare class detection by 4.3 mAP on LVIS.

02

Creates a large dataset of instructed image pairs for training.

03

Enhances downstream task performance with synthetic data.

Abstract

Diffusion models have exhibited remarkable prowess in visual generalization. Building on this success, we introduce an instruction-based object addition pipeline, named Add-SD, which automatically inserts objects into realistic scenes with rational sizes and positions. Different from layout-conditioned methods, Add-SD is solely conditioned on simple text prompts rather than any other human-costly references like bounding boxes. Our work contributes in three aspects: proposing a dataset containing numerous instructed image pairs; fine-tuning a diffusion model for rational generation; and generating synthetic data to boost downstream tasks. The first aspect involves creating a RemovalDataset consisting of original-edited image pairs with textual instructions, where an object has been removed from the original image while maintaining strong pixel consistency in the background. These data…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ylingfeng/add-sd
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel-Driven Software Engineering Techniques

MethodsDiffusion