Gen-3Diffusion: Realistic Image-to-3D Generation via 2D & 3D Diffusion Synergy
Yuxuan Xue, Xianghui Xie, Riccardo Marin, Gerard Pons-Moll

TL;DR
Gen-3Diffusion introduces a novel approach combining 2D and 3D diffusion models to generate realistic, multi-view consistent 3D objects and avatars from a single image, enhancing generalization and accuracy.
Contribution
The paper proposes a synchronized 2D and 3D diffusion framework that improves multi-view consistency and generalization in 3D object and avatar generation from single images.
Findings
Produces high-fidelity 3D objects and avatars
Enhances multi-view consistency in generated images
Demonstrates strong generalization to diverse shapes and clothing
Abstract
Creating realistic 3D objects and clothed avatars from a single RGB image is an attractive yet challenging problem. Due to its ill-posed nature, recent works leverage powerful prior from 2D diffusion models pretrained on large datasets. Although 2D diffusion models demonstrate strong generalization capability, they cannot guarantee the generated multi-view images are 3D consistent. In this paper, we propose Gen-3Diffusion: Realistic Image-to-3D Generation via 2D & 3D Diffusion Synergy. We leverage a pre-trained 2D diffusion model and a 3D diffusion model via our elegantly designed process that synchronizes two diffusion models at both training and sampling time. The synergy between the 2D and 3D diffusion models brings two major advantages: 1) 2D helps 3D in generalization: the pretrained 2D model has strong generalization ability to unseen images, providing strong shape priors for the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Processing Techniques and Applications · Advanced Vision and Imaging
MethodsDiffusion
