Progressive Rendering Distillation: Adapting Stable Diffusion for Instant Text-to-Mesh Generation without 3D Data
Zhiyuan Ma, Xinyue Liang, Rongyuan Wu, Xiangyu Zhu, Zhen Lei, Lei, Zhang

TL;DR
This paper introduces Progressive Rendering Distillation (PRD), a training scheme that adapts Stable Diffusion into a fast, high-quality 3D mesh generator from text prompts without requiring 3D ground-truth data.
Contribution
It proposes a novel distillation method that leverages multi-view diffusion models to train a 3D generator, significantly improving quality and speed over previous methods.
Findings
Produces 3D meshes in 1.2 seconds
Outperforms previous text-to-3D methods in quality
Supports training without 3D ground-truths
Abstract
It is highly desirable to obtain a model that can generate high-quality 3D meshes from text prompts in just seconds. While recent attempts have adapted pre-trained text-to-image diffusion models, such as Stable Diffusion (SD), into generators of 3D representations (e.g., Triplane), they often suffer from poor quality due to the lack of sufficient high-quality 3D training data. Aiming at overcoming the data shortage, we propose a novel training scheme, termed as Progressive Rendering Distillation (PRD), eliminating the need for 3D ground-truths by distilling multi-view diffusion models and adapting SD into a native 3D generator. In each iteration of training, PRD uses the U-Net to progressively denoise the latent from random noise for a few steps, and in each step it decodes the denoised latent into 3D output. Multi-view diffusion models, including MVDream and RichDreamer, are used in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputer Graphics and Visualization Techniques · Image Processing and 3D Reconstruction · 3D Shape Modeling and Analysis
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Concatenated Skip Connection · Max Pooling · Convolution · U-Net · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Diffusion
