A Data Perspective on Enhanced Identity Preservation for Diffusion Personalization
Xingzhe He, Zhiwen Cao, Nicholas Kolkin, Lantao Yu, Kun Wan, Helge, Rhodin, Ratheesh Kalarot

TL;DR
This paper introduces a data-centric regularization strategy for personalized text-to-image models, significantly improving identity preservation of specific subjects like logos and pets without modifying model architecture.
Contribution
A novel data generation approach that enhances identity preservation in text-to-image models, applicable across different architectures, setting new state-of-the-art results.
Findings
Improved identity preservation on benchmarks
Enhanced text alignment accuracy
Applicable to various model architectures
Abstract
Large text-to-image models have revolutionized the ability to generate imagery using natural language. However, particularly unique or personal visual concepts, such as pets and furniture, will not be captured by the original model. This has led to interest in how to personalize a text-to-image model. Despite significant progress, this task remains a formidable challenge, particularly in preserving the subject's identity. Most researchers attempt to address this issue by modifying model architectures. These methods are capable of keeping the subject structure and color but fail to preserve identity details. Towards this issue, our approach takes a data-centric perspective. We introduce a novel regularization dataset generation strategy on both the text and image level. This strategy enables the model to preserve fine details of the desired subjects, such as text and logos. Our method is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis · Advanced Image and Video Retrieval Techniques
