PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding

Zhen Li; Mingdeng Cao; Xintao Wang; Zhongang Qi; Ming-Ming Cheng; Ying; Shan

arXiv:2312.04461·cs.CV·December 8, 2023·1 cites

PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding

Zhen Li, Mingdeng Cao, Xintao Wang, Zhongang Qi, Ming-Ming Cheng, Ying, Shan

PDF

Open Access 1 Repo 3 Models

TL;DR

PhotoMaker is an efficient personalized text-to-image generation method that encodes multiple ID images into a unified embedding, enabling high fidelity, flexible control, and fast generation of realistic human photos.

Contribution

It introduces a stacked ID embedding approach and an ID-oriented data construction pipeline, improving ID preservation, efficiency, and generalization in personalized human photo synthesis.

Findings

01

Outperforms test-time fine-tuning methods in ID preservation

02

Provides faster generation with high-quality results

03

Demonstrates strong generalization and versatile applications

Abstract

Recent advances in text-to-image generation have made remarkable progress in synthesizing realistic human photos conditioned on given text prompts. However, existing personalized generation methods cannot simultaneously satisfy the requirements of high efficiency, promising identity (ID) fidelity, and flexible text controllability. In this work, we introduce PhotoMaker, an efficient personalized text-to-image generation method, which mainly encodes an arbitrary number of input ID images into a stack ID embedding for preserving ID information. Such an embedding, serving as a unified ID representation, can not only encapsulate the characteristics of the same input ID comprehensively, but also accommodate the characteristics of different IDs for subsequent integration. This paves the way for more intriguing and practically valuable applications. Besides, to drive the training of our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

TencentARC/PhotoMaker
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings