Instruct-Video2Avatar: Video-to-Avatar Generation with Instructions
Shaoxu Li

TL;DR
This paper introduces a novel method for creating and editing photo-realistic 3D avatars from short videos and text instructions, leveraging diffusion models and neural radiance fields to produce animatable avatars.
Contribution
It presents a new approach combining diffusion models and neural radiance fields for text-guided avatar editing and synthesis from monocular videos.
Findings
Outperforms existing state-of-the-art methods in quality and realism.
Produces animatable 3D neural head avatars with high fidelity.
Enables flexible editing of avatars based on textual instructions.
Abstract
We propose a method for synthesizing edited photo-realistic digital avatars with text instructions. Given a short monocular RGB video and text instructions, our method uses an image-conditioned diffusion model to edit one head image and uses the video stylization method to accomplish the editing of other head images. Through iterative training and update (three times or more), our method synthesizes edited photo-realistic animatable 3D neural head avatars with a deformable neural radiance field head synthesis method. In quantitative and qualitative studies on various subjects, our method outperforms state-of-the-art methods.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis · Human Motion and Animation
MethodsDiffusion
