Controllable Text-to-3D Generation via Surface-Aligned Gaussian   Splatting

Zhiqi Li; Yiming Chen; Lingzhe Zhao; Peidong Liu

arXiv:2403.09981·cs.CV·February 11, 2025·2 cites

Controllable Text-to-3D Generation via Surface-Aligned Gaussian Splatting

Zhiqi Li, Yiming Chen, Lingzhe Zhao, Peidong Liu

PDF

Open Access 2 Repos

TL;DR

This paper introduces a controllable text-to-3D generation framework that combines a novel neural network architecture with an efficient multi-stage pipeline, enabling high-quality, guided 3D content creation using surface-aligned Gaussian splatting.

Contribution

The work presents MVControl, a new neural network architecture for integrating various input conditions into multi-view diffusion models, and a multi-stage 3D generation pipeline using Gaussian representations and hybrid guidance.

Findings

01

Achieves robust generalization in controllable 3D generation.

02

Enables fine-grained geometry editing on 3D meshes.

03

Demonstrates high-quality 3D content creation guided by diverse conditions.

Abstract

While text-to-3D and image-to-3D generation tasks have received considerable attention, one important but under-explored field between them is controllable text-to-3D generation, which we mainly focus on in this work. To address this task, 1) we introduce Multi-view ControlNet (MVControl), a novel neural network architecture designed to enhance existing pre-trained multi-view diffusion models by integrating additional input conditions, such as edge, depth, normal, and scribble maps. Our innovation lies in the introduction of a conditioning module that controls the base diffusion model using both local and global embeddings, which are computed from the input condition images and camera poses. Once trained, MVControl is able to offer 3D diffusion guidance for optimization-based 3D generation. And, 2) we propose an efficient multi-stage 3D generation pipeline that leverages the benefits of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputer Graphics and Visualization Techniques · Human Motion and Animation · Interactive and Immersive Displays

MethodsDiffusion · Focus · Balanced Selection