Controllable Text-to-3D Generation via Surface-Aligned Gaussian Splatting
Zhiqi Li, Yiming Chen, Lingzhe Zhao, Peidong Liu

TL;DR
This paper introduces a controllable text-to-3D generation framework that combines a novel neural network architecture with an efficient multi-stage pipeline, enabling high-quality, guided 3D content creation using surface-aligned Gaussian splatting.
Contribution
The work presents MVControl, a new neural network architecture for integrating various input conditions into multi-view diffusion models, and a multi-stage 3D generation pipeline using Gaussian representations and hybrid guidance.
Findings
Achieves robust generalization in controllable 3D generation.
Enables fine-grained geometry editing on 3D meshes.
Demonstrates high-quality 3D content creation guided by diverse conditions.
Abstract
While text-to-3D and image-to-3D generation tasks have received considerable attention, one important but under-explored field between them is controllable text-to-3D generation, which we mainly focus on in this work. To address this task, 1) we introduce Multi-view ControlNet (MVControl), a novel neural network architecture designed to enhance existing pre-trained multi-view diffusion models by integrating additional input conditions, such as edge, depth, normal, and scribble maps. Our innovation lies in the introduction of a conditioning module that controls the base diffusion model using both local and global embeddings, which are computed from the input condition images and camera poses. Once trained, MVControl is able to offer 3D diffusion guidance for optimization-based 3D generation. And, 2) we propose an efficient multi-stage 3D generation pipeline that leverages the benefits of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputer Graphics and Visualization Techniques · Human Motion and Animation · Interactive and Immersive Displays
MethodsDiffusion · Focus · Balanced Selection
