A Data Perspective on Enhanced Identity Preservation for Diffusion   Personalization

Xingzhe He; Zhiwen Cao; Nicholas Kolkin; Lantao Yu; Kun Wan; Helge; Rhodin; Ratheesh Kalarot

arXiv:2311.04315·cs.CV·November 7, 2024·2 cites

A Data Perspective on Enhanced Identity Preservation for Diffusion Personalization

Xingzhe He, Zhiwen Cao, Nicholas Kolkin, Lantao Yu, Kun Wan, Helge, Rhodin, Ratheesh Kalarot

PDF

Open Access

TL;DR

This paper introduces a data-centric regularization strategy for personalized text-to-image models, significantly improving identity preservation of specific subjects like logos and pets without modifying model architecture.

Contribution

A novel data generation approach that enhances identity preservation in text-to-image models, applicable across different architectures, setting new state-of-the-art results.

Findings

01

Improved identity preservation on benchmarks

02

Enhanced text alignment accuracy

03

Applicable to various model architectures

Abstract

Large text-to-image models have revolutionized the ability to generate imagery using natural language. However, particularly unique or personal visual concepts, such as pets and furniture, will not be captured by the original model. This has led to interest in how to personalize a text-to-image model. Despite significant progress, this task remains a formidable challenge, particularly in preserving the subject's identity. Most researchers attempt to address this issue by modifying model architectures. These methods are capable of keeping the subject structure and color but fail to preserve identity details. Towards this issue, our approach takes a data-centric perspective. We introduce a novel regularization dataset generation strategy on both the text and image level. This strategy enables the model to preserve fine details of the desired subjects, such as text and logos. Our method is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis · Advanced Image and Video Retrieval Techniques