FantasyID: Face Knowledge Enhanced ID-Preserving Video Generation
Yunpeng Zhang, Qiang Wang, Fan Jiang, Yaqi Fan, Mu Xu, Yonggang Qi

TL;DR
FantasyID is a tuning-free video generation framework that enhances face knowledge in diffusion transformers, ensuring identity preservation and realistic facial dynamics through 3D priors and adaptive feature injection.
Contribution
It introduces a novel face knowledge enhancement method with 3D priors and adaptive feature guidance for identity-preserving text-to-video synthesis.
Findings
Outperforms existing tuning-free IPT2V methods.
Effectively maintains facial identity during dynamic video synthesis.
Improves facial expression and pose diversity in generated videos.
Abstract
Tuning-free approaches adapting large-scale pre-trained video diffusion models for identity-preserving text-to-video generation (IPT2V) have gained popularity recently due to their efficacy and scalability. However, significant challenges remain to achieve satisfied facial dynamics while keeping the identity unchanged. In this work, we present a novel tuning-free IPT2V framework by enhancing face knowledge of the pre-trained video model built on diffusion transformers (DiT), dubbed FantasyID. Essentially, 3D facial geometry prior is incorporated to ensure plausible facial structures during video synthesis. To prevent the model from learning copy-paste shortcuts that simply replicate reference face across frames, a multi-view face augmentation strategy is devised to capture diverse 2D facial appearance features, hence increasing the dynamics over the facial expressions and head poses.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Generative Adversarial Networks and Image Synthesis · Advanced Image and Video Retrieval Techniques
