Zero-shot Text-driven Physically Interpretable Face Editing
Yapeng Meng, Songru Yang, Xu Hu, Rui Zhao, Lincheng Li, Zhenwei Shi,, Zhengxia Zou

TL;DR
This paper introduces a physically interpretable, text-driven face editing method that models image manipulation as vector flow fields, optimizing them with CLIP guidance for high-quality, identity-preserving results.
Contribution
It presents a novel face editing paradigm using vector flow fields, explicitly or implicitly represented, guided by CLIP, and offers a fast, one-shot, and real-time extension.
Findings
Produces high-quality, identity-preserving face edits
Achieves physically interpretable manipulation results
Enables real-time video face editing
Abstract
This paper proposes a novel and physically interpretable method for face editing based on arbitrary text prompts. Different from previous GAN-inversion-based face editing methods that manipulate the latent space of GANs, or diffusion-based methods that model image manipulation as a reverse diffusion process, we regard the face editing process as imposing vector flow fields on face images, representing the offset of spatial coordinates and color for each image pixel. Under the above-proposed paradigm, we represent the vector flow field in two ways: 1) explicitly represent the flow vectors with rasterized tensors, and 2) implicitly parameterize the flow vectors as continuous, smooth, and resolution-agnostic neural fields, by leveraging the recent advances of implicit neural representations. The flow vectors are iteratively optimized under the guidance of the pre-trained Contrastive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Facial Nerve Paralysis Treatment and Research
MethodsDiffusion
