Dreamer XL: Towards High-Resolution Text-to-3D Generation via Trajectory   Score Matching

Xingyu Miao; Haoran Duan; Varun Ojha; Jun Song; Tejal Shah; Yang Long,; Rajiv Ranjan

arXiv:2405.11252·cs.CV·May 21, 2024·1 cites

Dreamer XL: Towards High-Resolution Text-to-3D Generation via Trajectory Score Matching

Xingyu Miao, Haoran Duan, Varun Ojha, Jun Song, Tejal Shah, Yang Long,, Rajiv Ranjan

PDF

Open Access 1 Repo

TL;DR

This paper introduces Trajectory Score Matching (TSM), a novel method that improves the stability and quality of high-resolution text-to-3D generation by reducing error accumulation in diffusion models, and enhances multi-stage optimization with Stable Diffusion XL.

Contribution

The paper proposes TSM to address pseudo ground truth inconsistency in diffusion-based 3D generation and integrates Stable Diffusion XL with pixel-wise gradient clipping for better high-resolution results.

Findings

01

TSM reduces error accumulation compared to ISM.

02

The method surpasses state-of-the-art in visual quality.

03

Stable Diffusion XL with gradient clipping improves 3D generation stability.

Abstract

In this work, we propose a novel Trajectory Score Matching (TSM) method that aims to solve the pseudo ground truth inconsistency problem caused by the accumulated error in Interval Score Matching (ISM) when using the Denoising Diffusion Implicit Models (DDIM) inversion process. Unlike ISM which adopts the inversion process of DDIM to calculate on a single path, our TSM method leverages the inversion process of DDIM to generate two paths from the same starting point for calculation. Since both paths start from the same starting point, TSM can reduce the accumulated error compared to ISM, thus alleviating the problem of pseudo ground truth inconsistency. TSM enhances the stability and consistency of the model's generated paths during the distillation process. We demonstrate this experimentally and further show that ISM is a special case of TSM. Furthermore, to optimize the current…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xingy038/dreamer-xl
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · Natural Language Processing Techniques · Speech Recognition and Synthesis

MethodsGradient Clipping · Diffusion