Geometry Image Diffusion: Fast and Data-Efficient Text-to-3D with Image-Based Surface Representation
Slava Elizarov, Ciara Rowles, Simon Donn\'e

TL;DR
Geometry Image Diffusion (GIMDiffusion) offers a fast, data-efficient method for text-to-3D generation by leveraging 2D image representations of 3D shapes, enabling high-quality 3D asset creation with limited data.
Contribution
We introduce GIMDiffusion, a novel approach that uses geometry images for efficient 3D shape representation, reducing data requirements and computational costs compared to traditional methods.
Findings
Enables fast 3D generation comparable to Text-to-Image models
Produces semantically meaningful, part-aware 3D objects
Operates effectively with limited 3D training data
Abstract
Generating high-quality 3D objects from textual descriptions remains a challenging problem due to computational cost, the scarcity of 3D data, and complex 3D representations. We introduce Geometry Image Diffusion (GIMDiffusion), a novel Text-to-3D model that utilizes geometry images to efficiently represent 3D shapes using 2D images, thereby avoiding the need for complex 3D-aware architectures. By integrating a Collaborative Control mechanism, we exploit the rich 2D priors of existing Text-to-Image models such as Stable Diffusion. This enables strong generalization even with limited 3D training data (allowing us to use only high-quality training data) as well as retaining compatibility with guidance techniques such as IPAdapter. In short, GIMDiffusion enables the generation of 3D assets at speeds comparable to current Text-to-Image models. The generated objects consist of semantically…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsHandwritten Text Recognition Techniques · Image Processing and 3D Reconstruction · Image Retrieval and Classification Techniques
MethodsDiffusion
