GenCA: A Text-conditioned Generative Model for Realistic and Drivable   Codec Avatars

Keqiang Sun; Amin Jourabloo; Riddhish Bhalodia; Moustafa Meshry; Yu; Rong; Zhengyu Yang; Thu Nguyen-Phuoc; Christian Haene; Jiu Xu; Sam Johnson,; Hongsheng Li; Sofien Bouaziz

arXiv:2408.13674·cs.CV·August 27, 2024

GenCA: A Text-conditioned Generative Model for Realistic and Drivable Codec Avatars

Keqiang Sun, Amin Jourabloo, Riddhish Bhalodia, Moustafa Meshry, Yu, Rong, Zhengyu Yang, Thu Nguyen-Phuoc, Christian Haene, Jiu Xu, Sam Johnson,, Hongsheng Li, Sofien Bouaziz

PDF

Open Access

TL;DR

This paper introduces GenCA, a text-conditioned generative model that creates realistic, detailed, and controllable 3D avatars, overcoming limitations of traditional methods and enabling diverse applications like editing and reconstruction.

Contribution

We develop a novel text-conditioned generative model that produces high-fidelity, editable 3D avatars with complete facial details and robust drivability, advancing beyond existing static or limited-avatar approaches.

Findings

01

Generates diverse, photo-realistic 3D avatars from text prompts.

02

Enables controllable avatar expression and identity manipulation.

03

Supports downstream tasks like avatar editing and single-shot reconstruction.

Abstract

Photo-realistic and controllable 3D avatars are crucial for various applications such as virtual and mixed reality (VR/MR), telepresence, gaming, and film production. Traditional methods for avatar creation often involve time-consuming scanning and reconstruction processes for each avatar, which limits their scalability. Furthermore, these methods do not offer the flexibility to sample new identities or modify existing ones. On the other hand, by learning a strong prior from data, generative models provide a promising alternative to traditional reconstruction methods, easing the time constraints for both data capture and processing. Additionally, generative methods enable downstream applications beyond reconstruction, such as editing and stylization. Nonetheless, the research on generative 3D avatars is still in its infancy, and therefore current methods still have limitations such as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLanguage and cultural evolution · Topic Modeling · Robotics and Automated Systems

MethodsDiffusion