Zero-1-to-G: Taming Pretrained 2D Diffusion Model for Direct 3D   Generation

Xuyi Meng; Chen Wang; Jiahui Lei; Kostas Daniilidis; Jiatao Gu; and; Lingjie Liu

arXiv:2501.05427·cs.CV·January 10, 2025

Zero-1-to-G: Taming Pretrained 2D Diffusion Model for Direct 3D Generation

Xuyi Meng, Chen Wang, Jiahui Lei, Kostas Daniilidis, Jiatao Gu, and, Lingjie Liu

PDF

Open Access

TL;DR

Zero-1-to-G leverages pretrained 2D diffusion models to enable direct 3D object generation from single views using Gaussian splats, incorporating cross-view attention for 3D consistency.

Contribution

It introduces the first method to utilize pretrained 2D diffusion priors for direct 3D generation via Gaussian splats with cross-view attention mechanisms.

Findings

01

Achieves superior 3D generation quality on synthetic and real datasets.

02

Effectively captures 3D consistency through novel attention layers.

03

Demonstrates efficient training and strong generalization to unseen objects.

Abstract

Recent advances in 2D image generation have achieved remarkable quality,largely driven by the capacity of diffusion models and the availability of large-scale datasets. However, direct 3D generation is still constrained by the scarcity and lower fidelity of 3D datasets. In this paper, we introduce Zero-1-to-G, a novel approach that addresses this problem by enabling direct single-view generation on Gaussian splats using pretrained 2D diffusion models. Our key insight is that Gaussian splats, a 3D representation, can be decomposed into multi-view images encoding different attributes. This reframes the challenging task of direct 3D generation within a 2D diffusion framework, allowing us to leverage the rich priors of pretrained 2D diffusion models. To incorporate 3D awareness, we introduce cross-view and cross-attribute attention layers, which capture complex correlations and enforce 3D…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputer Graphics and Visualization Techniques

MethodsSoftmax · Attention Is All You Need · Diffusion