Generating Multi-Image Synthetic Data for Text-to-Image Customization

Nupur Kumari; Xi Yin; Jun-Yan Zhu; Ishan Misra; Samaneh Azadi

arXiv:2502.01720·cs.CV·October 14, 2025

Generating Multi-Image Synthetic Data for Text-to-Image Customization

Nupur Kumari, Xi Yin, Jun-Yan Zhu, Ishan Misra, Samaneh Azadi

PDF

Open Access 1 Datasets

TL;DR

This paper introduces a new method for customizing text-to-image models by creating a multi-image synthetic dataset and training an encoder-based model that captures detailed visual features, resulting in improved image quality.

Contribution

The authors propose a novel approach using a synthetic multi-image dataset and an encoder-based model with a shared attention mechanism for better customization of text-to-image generation.

Findings

01

Outperforms existing encoder-based customization methods

02

Uses synthetic multi-image dataset for training

03

Improves image quality and consistency in generated images

Abstract

Customization of text-to-image models enables users to insert new concepts or objects and generate them in unseen settings. Existing methods either rely on comparatively expensive test-time optimization or train encoders on single-image datasets without multi-image supervision, which can limit image quality. We propose a simple approach to address these challenges. We first leverage existing text-to-image models and 3D datasets to create a high-quality Synthetic Customization Dataset (SynCD) consisting of multiple images of the same object in different lighting, backgrounds, and poses. Using this dataset, we train an encoder-based model that incorporates fine-grained visual details from reference images via a shared attention mechanism. Finally, we propose an inference technique that normalizes text and image guidance vectors to mitigate overexposure issues in sampled images. Through…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

nupurkmr9/syncd
dataset· 76 dl
76 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Modeling in Geospatial Applications · Computer Graphics and Visualization Techniques · Image Retrieval and Classification Techniques

MethodsSoftmax · Attention Is All You Need