Progressive Rendering Distillation: Adapting Stable Diffusion for   Instant Text-to-Mesh Generation without 3D Data

Zhiyuan Ma; Xinyue Liang; Rongyuan Wu; Xiangyu Zhu; Zhen Lei; Lei; Zhang

arXiv:2503.21694·cs.GR·March 28, 2025·2 cites

Progressive Rendering Distillation: Adapting Stable Diffusion for Instant Text-to-Mesh Generation without 3D Data

Zhiyuan Ma, Xinyue Liang, Rongyuan Wu, Xiangyu Zhu, Zhen Lei, Lei, Zhang

PDF

Open Access 1 Repo 1 Models

TL;DR

This paper introduces Progressive Rendering Distillation (PRD), a training scheme that adapts Stable Diffusion into a fast, high-quality 3D mesh generator from text prompts without requiring 3D ground-truth data.

Contribution

It proposes a novel distillation method that leverages multi-view diffusion models to train a 3D generator, significantly improving quality and speed over previous methods.

Findings

01

Produces 3D meshes in 1.2 seconds

02

Outperforms previous text-to-3D methods in quality

03

Supports training without 3D ground-truths

Abstract

It is highly desirable to obtain a model that can generate high-quality 3D meshes from text prompts in just seconds. While recent attempts have adapted pre-trained text-to-image diffusion models, such as Stable Diffusion (SD), into generators of 3D representations (e.g., Triplane), they often suffer from poor quality due to the lack of sufficient high-quality 3D training data. Aiming at overcoming the data shortage, we propose a novel training scheme, termed as Progressive Rendering Distillation (PRD), eliminating the need for 3D ground-truths by distilling multi-view diffusion models and adapting SD into a native 3D generator. In each iteration of training, PRD uses the U-Net to progressively denoise the latent from random noise for a few steps, and in each step it decodes the denoised latent into 3D output. Multi-view diffusion models, including MVDream and RichDreamer, are used in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

theericma/triplaneturbo
jaxOfficial

Models

🤗
ZhiyuanthePony/TriplaneTurbo
model· ♡ 4
♡ 4

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputer Graphics and Visualization Techniques · Image Processing and 3D Reconstruction · 3D Shape Modeling and Analysis

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Concatenated Skip Connection · Max Pooling · Convolution · U-Net · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Diffusion