DiffMagicFace: Identity Consistent Facial Editing of Real Videos
Huanghao Yin, Shenkun Xu, Kanle Shi, Junhai Yong, Bin Wang

TL;DR
DiffMagicFace is a novel framework for identity-preserving, consistent facial video editing using text and image control models, achieving high-quality results without relying on video datasets.
Contribution
The paper introduces a new video editing method that maintains facial identity and consistency across frames, leveraging fine-tuned models and a specialized dataset creation process.
Findings
High-quality, identity-preserving video edits achieved.
Outperforms current state-of-the-art methods in visual and quantitative metrics.
Effective on complex tasks like talking head videos.
Abstract
Text-conditioned image editing has greatly benefitted from the advancements in Image Diffusion Models. However, extending these techniques to facial video editing introduces challenges in preserving facial identity throughout the source video and ensuring consistency of the edited subject across frames. In this paper, we introduce DiffMagicFace, a unique video editing framework that integrates two fine-tuned models for text and image control. These models operate concurrently during inference to produce video frames that maintain identity features while seamlessly aligning with the editing semantics. To ensure the consistency of the edited videos, we develop a dataset comprising images showcasing various facial perspectives for each edited subject. The creation of a data set is achieved through rendering techniques and the subsequent application of optimization algorithms. Remarkably,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
