ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity   Preserving

Jiehui Huang; Xiao Dong; Wenhui Song; Zheng Chong; Zhenchao Tang; Jun; Zhou; Yuhao Cheng; Long Chen; Hanhui Li; Yiqiang Yan; Shengcai Liao; and; Xiaodan Liang

arXiv:2404.16771·cs.CV·December 31, 2024·2 cites

ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity Preserving

Jiehui Huang, Xiao Dong, Wenhui Song, Zheng Chong, Zhenchao Tang, Jun, Zhou, Yuhao Cheng, Long Chen, Hanhui Li, Yiqiang Yan, Shengcai Liao, and, Xiaodan Liang

PDF

Open Access 1 Repo 1 Models 1 Datasets

TL;DR

ConsistentID is a novel method for high-fidelity, identity-preserving portrait generation using multimodal facial prompts and a new dataset, achieving superior accuracy and diversity with efficient inference.

Contribution

The paper introduces ConsistentID, a new approach combining multimodal facial prompts and an ID-preservation network, along with a large dataset FGID for training, to improve facial identity consistency in portrait generation.

Findings

01

Outperforms existing methods in identity preservation and diversity.

02

Achieves fast inference speed despite multimodal complexity.

03

Demonstrates high accuracy on the MyStyle dataset.

Abstract

Diffusion-based technologies have made significant strides, particularly in personalized and customized facialgeneration. However, existing methods face challenges in achieving high-fidelity and detailed identity (ID)consistency, primarily due to insufficient fine-grained control over facial areas and the lack of a comprehensive strategy for ID preservation by fully considering intricate facial details and the overall face. To address these limitations, we introduce ConsistentID, an innovative method crafted for diverseidentity-preserving portrait generation under fine-grained multimodal facial prompts, utilizing only a single reference image. ConsistentID comprises two key components: a multimodal facial prompt generator that combines facial features, corresponding facial descriptions and the overall facial context to enhance precision in facial details, and an ID-preservation network…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

JackAILab/ConsistentID
pytorchOfficial

Models

🤗
JackAILab/ConsistentID
model· 106 dl· ♡ 8
106 dl♡ 8

Datasets

JackAILab/FGID
dataset· 78 dl
78 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Image Retrieval and Classification Techniques

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings