ClipFace: Text-guided Editing of Textured 3D Morphable Models
Shivangi Aneja, Justus Thies, Angela Dai, Matthias Nie{\ss}ner

TL;DR
ClipFace introduces a self-supervised method for editing textured 3D face models using natural language prompts, enabling control over expression and appearance with high-quality texture synthesis guided by differentiable rendering and CLIP-based losses.
Contribution
It presents a novel self-supervised framework that jointly synthesizes expressive, textured, and articulated 3D faces controlled by language prompts, improving controllability and texture quality.
Findings
Enables text-guided editing of 3D face models.
Produces high-quality, temporally consistent textures.
Uses CLIP and differentiable rendering for training.
Abstract
We propose ClipFace, a novel self-supervised approach for text-guided editing of textured 3D morphable model of faces. Specifically, we employ user-friendly language prompts to enable control of the expressions as well as appearance of 3D faces. We leverage the geometric expressiveness of 3D morphable models, which inherently possess limited controllability and texture expressivity, and develop a self-supervised generative model to jointly synthesize expressive, textured, and articulated faces in 3D. We enable high-quality texture generation for 3D faces by adversarial self-supervised training, guided by differentiable rendering against collections of real RGB images. Controllable editing and manipulation are given by language prompts to adapt texture and expression of the 3D morphable model. To this end, we propose a neural network that predicts both texture and expression latent codes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Generative Adversarial Networks and Image Synthesis · 3D Shape Modeling and Analysis
MethodsContrastive Language-Image Pre-training
