Text and Image Guided 3D Avatar Generation and Manipulation
Zehranaz Canfes, M. Furkan Atasoy, Alara Dirik, Pinar Yanardag

TL;DR
This paper introduces a novel method for manipulating 3D face avatars' shape and texture using text or image prompts, leveraging CLIP and a pre-trained 3D GAN within a differentiable rendering pipeline, achieving efficient and targeted modifications.
Contribution
It presents a new 3D manipulation technique that controls shape and texture via prompts, combining CLIP with 3D GANs for efficient avatar editing.
Findings
Manipulation takes only 5 minutes per instance.
Effective control of shape and texture using prompts.
Demonstrated superior results through extensive comparisons.
Abstract
The manipulation of latent space has recently become an interesting topic in the field of generative models. Recent research shows that latent directions can be used to manipulate images towards certain attributes. However, controlling the generation process of 3D generative models remains a challenge. In this work, we propose a novel 3D manipulation method that can manipulate both the shape and texture of the model using text or image-based prompts such as 'a young face' or 'a surprised face'. We leverage the power of Contrastive Language-Image Pre-training (CLIP) model and a pre-trained 3D GAN model designed to generate face avatars, and create a fully differentiable rendering pipeline to manipulate meshes. More specifically, our method takes an input latent code and modifies it such that the target attribute specified by a text or image prompt is present or enhanced, while leaving…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Text and Image Guided 3D Avatar Generation and Manipulation· youtube
Text and Image Guided 3D Avatar Generation and Manipulation· youtube
Taxonomy
TopicsFace recognition and analysis · Generative Adversarial Networks and Image Synthesis · Human Motion and Animation
