Flow Score Distillation for Diverse Text-to-3D Generation

Runjie Yan; Kailu Wu; Kaisheng Ma

arXiv:2405.10988·cs.LG·July 30, 2024

Flow Score Distillation for Diverse Text-to-3D Generation

Runjie Yan, Kailu Wu, Kaisheng Ma

PDF

Open Access 4 Reviews

TL;DR

This paper introduces Flow Score Distillation (FSD), a novel method that improves diversity in text-to-3D generation by modifying noise sampling strategies, building on insights from Score Distillation Sampling and DDIM models.

Contribution

The paper reveals the connection between SDS and DDIM, and proposes a new noise sampling approach that significantly enhances diversity in text-to-3D generation.

Findings

01

FSD improves diversity without losing quality.

02

The noise sampling strategy is crucial for diversity.

03

FSD outperforms existing methods in experiments.

Abstract

Recent advancements in Text-to-3D generation have yielded remarkable progress, particularly through methods that rely on Score Distillation Sampling (SDS). While SDS exhibits the capability to create impressive 3D assets, it is hindered by its inherent maximum-likelihood-seeking essence, resulting in limited diversity in generation outcomes. In this paper, we discover that the Denoise Diffusion Implicit Models (DDIM) generation process (\ie PF-ODE) can be succinctly expressed using an analogue of SDS loss. One step further, one can see SDS as a generalized DDIM generation process. Following this insight, we show that the noise sampling strategy in the noise addition stage significantly restricts the diversity of generation results. To address this limitation, we present an innovative noise sampling approach and introduce a novel text-to-3D method called Flow Score Distillation (FSD).…

Peer Reviews

Decision·ICLR 2025 Conference Withdrawn Submission

Reviewer 01Rating 3Confidence 4

Strengths

S1. This paper aims to solve an important problem: the lack of diversity in generated 3D results by score distillation. S2. The proposed world-map noise function is interesting. S3. The results show varied 3D objects given the same prompt without performance degradation compared with SDS.

Weaknesses

W1. Lack of novelty and originality. The existing papers have already discussed the connection between DDIM and SDS [NewRef-1, NewRef-2] and using a fixed noise for SDS [NewRef-2, NewRef-3]. In addition, the paper does not include enough rationales of how the proposed noise sampling can resolve the issues in using a fixed noise. W2. Insufficient experiments. The paper lacks in-depth analysis on the proposed noise sampling technique. In addition, some results of previous methods show much diffe

Reviewer 02Rating 3Confidence 3

Strengths

- This paper formulates SDS as a generalized DDIM (Denoise Diffusion Implicit Models) process and introduces a world-map noise function for 3D generation, the noise mechanism design is simple and seems effective. - I like the quality of its generated 3D assets, which are sharp and come with fine-grained details, the results quality is consistent across various examples from main paper and supp. - it also reveals the relationships between initial noise map to final 3D assets, which is insightful

Weaknesses

- While I like the quality of plotted 3D assets examples, the biggest concern I think is on the contribution significance and novelty. All the strategies like leveraging multi-view diffusion models, scheduled noise level annealing, SDS formulation are already extensively explored in previous works starting from MV-Dream, ProlificDreamer, VSD. The innovation on noise map sampling is kind weak - I think the comparison to baselines are also not fair and informative enough, as the proposed method is

Reviewer 03Rating 5Confidence 3

Strengths

- The paper built a connection between PF-ODE and SDS gradient to show that the SDS gradient is equivalence to some term in diffusion. - The paper proposes to use a consistent noise (which is only possible with its FSD formulation) to encourage less multi-face problems. - The paper shows that the proposed method can improve consistency.

Weaknesses

- Inadequate analysis support to the claim of quality and diversity improvement. The paper claims that the proposing method has better quality and diversity comparing with previous method. I appreciate the FID analysis experiment. However, it is not convincing. Firstly, the FID is computed between the rendered images and generated images, where the rendered images are from 16 (prompt) x 4 (seed) = 64 3D objects (from my understanding). The amount of evaluated 3D objects is relatively lower. Seco

Reviewer 04Rating 5Confidence 5

Strengths

1. The paper is well-written and easy to understand. 2. The paper build a connection between SDS loss and the DDIM generation process, which is valuable and could inspire further research. 3. The proposed coarse-to-fine pipeline, though primarily an engineering solution, effectively contributes to generating high-quality 3D objects.

Weaknesses

1. The concept of applying deterministic (fixed) noise to the SDS loss was introduced in previous work [1], but this prior work is neither cited nor discussed in the paper. My understanding is that the single-noise training in [1] is equivalent to the vanilla design described in Sec 4.1.1. 2. The proposed world-map noise function does not account for depth information -- noise is sampled without considering the shape of the generated object. Moreover, in the context of latent diffusion models,

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · Computer Graphics and Visualization Techniques · Video Analysis and Summarization

MethodsDiffusion