Cross Initialization for Personalized Text-to-Image Generation

Lianyu Pang; Jian Yin; Haoran Xie; Qiping Wang; Qing Li; Xudong Mao

arXiv:2312.15905·cs.CV·December 27, 2023·2 cites

Cross Initialization for Personalized Text-to-Image Generation

Lianyu Pang, Jian Yin, Haoran Xie, Qiping Wang, Qing Li, Xudong Mao

PDF

Open Access 1 Repo

TL;DR

This paper introduces Cross Initialization, a novel method for personalized text-to-image generation that improves reconstruction quality and editability by narrowing the gap between initial and learned embeddings, reducing optimization steps, and enabling facial expression editing.

Contribution

The paper proposes Cross Initialization, a new approach that enhances personalization in text-to-image models by improving embedding initialization, reducing training steps, and enabling better facial expression editing.

Findings

01

Cross Initialization narrows the embedding gap significantly.

02

Reduces optimization steps from 5000 to 320.

03

Enables successful facial expression editing.

Abstract

Recently, there has been a surge in face personalization techniques, benefiting from the advanced capabilities of pretrained text-to-image diffusion models. Among these, a notable method is Textual Inversion, which generates personalized images by inverting given images into textual embeddings. However, methods based on Textual Inversion still struggle with balancing the trade-off between reconstruction quality and editability. In this study, we examine this issue through the lens of initialization. Upon closely examining traditional initialization methods, we identified a significant disparity between the initial and learned embeddings in terms of both scale and orientation. The scale of the learned embedding can be up to 100 times greater than that of the initial embedding. Such a significant change in the embedding could increase the risk of overfitting, thereby compromising the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lyupang/crossinitialization
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Generative Adversarial Networks and Image Synthesis · Fetal and Pediatric Neurological Disorders

MethodsDiffusion