JeDi: Joint-Image Diffusion Models for Finetuning-Free Personalized Text-to-Image Generation
Yu Zeng, Vishal M. Patel, Haochen Wang, Xun Huang, Ting-Chun Wang,, Ming-Yu Liu, Yogesh Balaji

TL;DR
JeDi introduces a finetuning-free method for personalized text-to-image generation by learning joint distributions of related image-text pairs, enabling high-quality, identity-preserving image synthesis using reference images.
Contribution
The paper presents a novel joint-image diffusion approach that eliminates the need for finetuning, improving personalization quality and efficiency in text-to-image models.
Findings
Achieves state-of-the-art generation quality.
Outperforms prior finetuning-based and free methods.
Enables fast personalization with reference images.
Abstract
Personalized text-to-image generation models enable users to create images that depict their individual possessions in diverse scenes, finding applications in various domains. To achieve the personalization capability, existing methods rely on finetuning a text-to-image foundation model on a user's custom dataset, which can be non-trivial for general users, resource-intensive, and time-consuming. Despite attempts to develop finetuning-free methods, their generation quality is much lower compared to their finetuning counterparts. In this paper, we propose Joint-Image Diffusion (\jedi), an effective technique for learning a finetuning-free personalization model. Our key idea is to learn the joint distribution of multiple related text-image pairs that share a common subject. To facilitate learning, we propose a scalable synthetic dataset generation technique. Once trained, our model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · AI in cancer detection · Video Analysis and Summarization
MethodsDiffusion
