Add-SD: Rational Generation without Manual Reference
Lingfeng Yang, Xinyu Zhang, Xiang Li, Jinwen Chen, Kun Yao, Gang, Zhang, Errui Ding, Lingqiao Liu, Jingdong Wang, Jian Yang

TL;DR
Add-SD is a text-conditioned diffusion model that automatically inserts objects into images with rational sizes and positions, improving data diversity and performance on downstream tasks, especially for rare classes.
Contribution
It introduces a dataset and fine-tunes a diffusion model for rational object insertion based solely on text prompts, without manual references.
Findings
Improves rare class detection by 4.3 mAP on LVIS.
Creates a large dataset of instructed image pairs for training.
Enhances downstream task performance with synthetic data.
Abstract
Diffusion models have exhibited remarkable prowess in visual generalization. Building on this success, we introduce an instruction-based object addition pipeline, named Add-SD, which automatically inserts objects into realistic scenes with rational sizes and positions. Different from layout-conditioned methods, Add-SD is solely conditioned on simple text prompts rather than any other human-costly references like bounding boxes. Our work contributes in three aspects: proposing a dataset containing numerous instructed image pairs; fine-tuning a diffusion model for rational generation; and generating synthetic data to boost downstream tasks. The first aspect involves creating a RemovalDataset consisting of original-edited image pairs with textual instructions, where an object has been removed from the original image while maintaining strong pixel consistency in the background. These data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel-Driven Software Engineering Techniques
MethodsDiffusion
