One-shot Implicit Animatable Avatars with Model-based Priors
Yangyi Huang, Hongwei Yi, Weiyang Liu, Haofan Wang, Boxi Wu, Wenxiao, Wang, Binbin Lin, Debing Zhang, Deng Cai

TL;DR
ELICIT is a novel method that creates realistic, animatable 3D human avatars from just a single image by leveraging human body priors and visual semantics, outperforming existing methods.
Contribution
The paper introduces ELICIT, a single-image neural radiance field approach that incorporates 3D body shape and semantic priors for realistic avatar creation.
Findings
Outperforms baseline methods on multiple benchmarks.
Generates plausible full-body avatars from a single image.
Utilizes CLIP for text-conditioned unseen region generation.
Abstract
Existing neural rendering methods for creating human avatars typically either require dense input signals such as video or multi-view images, or leverage a learned prior from large-scale specific 3D human datasets such that reconstruction can be performed with sparse-view inputs. Most of these methods fail to achieve realistic reconstruction when only a single image is available. To enable the data-efficient creation of realistic animatable 3D humans, we propose ELICIT, a novel method for learning human-specific neural radiance fields from a single image. Inspired by the fact that humans can effortlessly estimate the body geometry and imagine full-body clothing from a single image, we leverage two priors in ELICIT: 3D geometry prior and visual semantic prior. Specifically, ELICIT utilizes the 3D body shape geometry prior from a skinned vertex-based template model (i.e., SMPL) and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
One-shot Implicit Animatable Avatars with Model-based Priors· youtube
Taxonomy
TopicsHuman Pose and Action Recognition · Advanced Vision and Imaging · 3D Shape Modeling and Analysis
Methodsfail · Contrastive Language-Image Pre-training
