Generating Multi-Image Synthetic Data for Text-to-Image Customization
Nupur Kumari, Xi Yin, Jun-Yan Zhu, Ishan Misra, Samaneh Azadi

TL;DR
This paper introduces a new method for customizing text-to-image models by creating a multi-image synthetic dataset and training an encoder-based model that captures detailed visual features, resulting in improved image quality.
Contribution
The authors propose a novel approach using a synthetic multi-image dataset and an encoder-based model with a shared attention mechanism for better customization of text-to-image generation.
Findings
Outperforms existing encoder-based customization methods
Uses synthetic multi-image dataset for training
Improves image quality and consistency in generated images
Abstract
Customization of text-to-image models enables users to insert new concepts or objects and generate them in unseen settings. Existing methods either rely on comparatively expensive test-time optimization or train encoders on single-image datasets without multi-image supervision, which can limit image quality. We propose a simple approach to address these challenges. We first leverage existing text-to-image models and 3D datasets to create a high-quality Synthetic Customization Dataset (SynCD) consisting of multiple images of the same object in different lighting, backgrounds, and poses. Using this dataset, we train an encoder-based model that incorporates fine-grained visual details from reference images via a shared attention mechanism. Finally, we propose an inference technique that normalizes text and image guidance vectors to mitigate overexposure issues in sampled images. Through…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Modeling in Geospatial Applications · Computer Graphics and Visualization Techniques · Image Retrieval and Classification Techniques
MethodsSoftmax · Attention Is All You Need
