Text2Control3D: Controllable 3D Avatar Generation in Neural Radiance   Fields using Geometry-Guided Text-to-Image Diffusion Model

Sungwon Hwang; Junha Hyung; Jaegul Choo

arXiv:2309.03550·cs.CV·September 8, 2023·2 cites

Text2Control3D: Controllable 3D Avatar Generation in Neural Radiance Fields using Geometry-Guided Text-to-Image Diffusion Model

Sungwon Hwang, Junha Hyung, Jaegul Choo

PDF

Open Access

TL;DR

Text2Control3D introduces a method for controllable 3D avatar generation from monocular videos, leveraging diffusion models and NeRF, enabling facial expression control and addressing viewpoint texture issues.

Contribution

The paper presents a novel approach combining diffusion-based viewpoint-aware image generation with deformable NeRF for controllable 3D avatar creation from monocular videos.

Findings

01

Effective facial expression control via cross-reference attention.

02

Addresses viewpoint-agnostic texture problem with low-pass filtering.

03

Constructs deformable NeRF with per-image deformation fields.

Abstract

Recent advances in diffusion models such as ControlNet have enabled geometrically controllable, high-fidelity text-to-image generation. However, none of them addresses the question of adding such controllability to text-to-3D generation. In response, we propose Text2Control3D, a controllable text-to-3D avatar generation method whose facial expression is controllable given a monocular video casually captured with hand-held camera. Our main strategy is to construct the 3D avatar in Neural Radiance Fields (NeRF) optimized with a set of controlled viewpoint-aware images that we generate from ControlNet, whose condition input is the depth map extracted from the input video. When generating the viewpoint-aware images, we utilize cross-reference attention to inject well-controlled, referential facial expression and appearance via cross attention. We also conduct low-pass filtering of Gaussian…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · 3D Shape Modeling and Analysis · Computer Graphics and Visualization Techniques

MethodsNone · Diffusion