Human-3Diffusion: Realistic Avatar Creation via Explicit 3D Consistent Diffusion Models
Yuxuan Xue, Xianghui Xie, Riccardo Marin, Gerard Pons-Moll

TL;DR
This paper introduces Human 3Diffusion, a novel method that combines 2D diffusion models with explicit 3D reconstruction to generate realistic, multi-view consistent avatars from a single RGB image, outperforming previous methods.
Contribution
It proposes a coupling of 2D multi-view diffusion with a 3D Gaussian Splats reconstruction model for improved 3D consistency and realism in avatar creation.
Findings
Outperforms state-of-the-art avatar creation methods
Achieves high-fidelity geometry and appearance from a single image
Validates the effectiveness of multi-view priors and explicit 3D refinement
Abstract
Creating realistic avatars from a single RGB image is an attractive yet challenging problem. Due to its ill-posed nature, recent works leverage powerful prior from 2D diffusion models pretrained on large datasets. Although 2D diffusion models demonstrate strong generalization capability, they cannot provide multi-view shape priors with guaranteed 3D consistency. We propose Human 3Diffusion: Realistic Avatar Creation via Explicit 3D Consistent Diffusion. Our key insight is that 2D multi-view diffusion and 3D reconstruction models provide complementary information for each other, and by coupling them in a tight manner, we can fully leverage the potential of both models. We introduce a novel image-conditioned generative 3D Gaussian Splats reconstruction model that leverages the priors from 2D multi-view diffusion models, and provides an explicit 3D representation, which further guides the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
Topics3D Shape Modeling and Analysis · Human Motion and Animation · Computer Graphics and Visualization Techniques
MethodsDiffusion
