Consistent Flow Distillation for Text-to-3D Generation

Runjie Yan; Yinbo Chen; Xiaolong Wang

arXiv:2501.05445·cs.CV·January 10, 2025

Consistent Flow Distillation for Text-to-3D Generation

Runjie Yan, Yinbo Chen, Xiaolong Wang

PDF

Open Access 3 Reviews

TL;DR

This paper introduces Consistent Flow Distillation (CFD), a novel method that improves text-to-3D generation by ensuring multi-view consistency in flow gradients, leading to higher quality and diversity.

Contribution

We propose CFD, which leverages gradient-guided sampling and multi-view consistent noise to enhance 3D generation quality and diversity over existing SDS-based methods.

Findings

01

CFD outperforms previous methods in text-to-3D generation.

02

Multi-view consistency improves 3D visual quality.

03

Gradient-based guidance enhances diversity and fidelity.

Abstract

Score Distillation Sampling (SDS) has made significant strides in distilling image-generative models for 3D generation. However, its maximum-likelihood-seeking behavior often leads to degraded visual quality and diversity, limiting its effectiveness in 3D applications. In this work, we propose Consistent Flow Distillation (CFD), which addresses these limitations. We begin by leveraging the gradient of the diffusion ODE or SDE sampling process to guide the 3D generation. From the gradient-based sampling perspective, we find that the consistency of 2D image flows across different viewpoints is important for high-quality 3D generation. To achieve this, we introduce multi-view consistent Gaussian noise on the 3D object, which can be rendered from various viewpoints to compute the flow gradient. Our experiments demonstrate that CFD, through consistent flows, significantly outperforms…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 8Confidence 3

Strengths

[**Novelty**] - The adaptation of probability flow ODE (PF-ODE) with clean flow gradient from 2D images to guide 3D generation is innovative - The adaptation of multi-view consistent Gaussian noise ensures a unified appearance from all angles, which is the key to high-fidelity texture generation [**Significance**] - The propose design of multi-view consistent noise is useful for the whole community, its performance boost in 3D-FID and 3D-CLIP scores compared to SDS, ISM, and VSD baselines, and

Weaknesses

- On significance and novelty, I think based on the current progress in the field of 3DGen AI, although CFD introduces innovative noise techniques, it doesn’t propose entirely new model architectures or evaluation metrics beyond standard score distillation approaches. This is more fundamental concern when existing 3D-generative models can generate high-quality 3D assets within minutes, which this approach can still take hours. - Limited Qualitative Examples for long and complex Prompts: althoug

Reviewer 02Rating 6Confidence 4

Strengths

- The paper stands out for its robust presentation of results, thorough experimental analysis, and compelling evidence. - Introducing Consistent Flow Distillation, the paper leverages 2D clean flow gradients and multi-view consistent noise to elevate the diversity and quality of 3D generation. - Through empirical results, it is evident that the proposed CFD effectively enhances the diversity of generated outputs, showcasing its potency in improving the quality and variety of 3D-generated content

Weaknesses

- The stated contributions appear to overlap with existing methodologies. - The utilization of SDE formulations mirrors the approach outlined in "Consistent3D: Towards Consistent High-Fidelity Text-to-3D Generation with Deterministic Sampling Prior." While Consistent3D emphasizes addressing the unpredictability inherent in SDE sampling by introducing a deterministic sampling prior, the rationale behind employing image PF-ODE to steer 3D generation remains ambiguous. - The concept of mult

Reviewer 03Rating 6Confidence 3

Strengths

1. The generation quality of this proposed method is very high. The textures are realistic and detailed, providing a high level of visual fidelity that closely resembles real-world materials. 2. This paper is well-written, with well-organized sections that guide the reader through theory and methodology.

Weaknesses

1. Concurrent work. I believe it is necessary for the authors to clarify the distinction between their approach and “Consistent Flow Distillation for Test-to-3D Generation” within the main body of this paper. The ODF-based optimization and multi-view consistent Gaussian noise used here are quite similar to those in FSD, which warrants a more explicit comparison. 2. Experimental Setup. (a) It’s unclear whether CFD utilizes MVDream in the comparison experiments, and if so, this may introduce an un

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation

MethodsDiffusion