Noise-Free Score Distillation

Oren Katzir; Or Patashnik; Daniel Cohen-Or; Dani Lischinski

arXiv:2310.17590·cs.CV·October 27, 2023·6 cites

Noise-Free Score Distillation

Oren Katzir, Or Patashnik, Daniel Cohen-Or, Dani Lischinski

PDF

Open Access 3 Reviews

TL;DR

This paper introduces Noise-Free Score Distillation (NFSD), a simplified method for distilling pre-trained text-to-image diffusion models that reduces reliance on large guidance scales and improves output quality.

Contribution

We reinterpret SDS to explain the role of guidance scales and propose NFSD, a minimal modification that enhances distillation effectiveness and output realism.

Findings

01

NFSD achieves better distillation with lower guidance scales.

02

NFSD prevents over-smoothing of generated images.

03

Qualitative results show improved realism and prompt adherence.

Abstract

Score Distillation Sampling (SDS) has emerged as the de facto approach for text-to-content generation in non-image domains. In this paper, we reexamine the SDS process and introduce a straightforward interpretation that demystifies the necessity for large Classifier-Free Guidance (CFG) scales, rooted in the distillation of an undesired noise term. Building upon our interpretation, we propose a novel Noise-Free Score Distillation (NFSD) process, which requires minimal modifications to the original SDS framework. Through this streamlined design, we achieve more effective distillation of pre-trained text-to-image diffusion models while using a nominal CFG scale. This strategic choice allows us to prevent the over-smoothing of results, ensuring that the generated data is both realistic and complies with the desired prompt. To demonstrate the efficacy of NFSD, we provide qualitative examples…

Peer Reviews

Decision·ICLR 2024 poster

Reviewer 01Rating 6· marginally above the acceptance thresholdConfidence 5

Strengths

S1. The proposed method, NFSD, is simple yet effective. In addition, the qualitative results support and demonstrate the effectiveness of NFSD. S2. The paper is well-organized and easy to understand. S3. The analogical decomposition of scores into three terms is interesting and makes sense.

Weaknesses

W1. Despite the interestingness of score decomposition, the proposed method stems from numerous assumptions based on empirical findings without a principal approach. W2. Thorough experiments to validate the effectiveness of NFSD are absent. Although the qualitative results show improved quality of text-to-NeRF than conventional SDS-based approaches, there is no ablation study and quantitative result. W3. Some technical parts lack enough rationales. For example, estimating the domain score by

Reviewer 02Rating 6· marginally above the acceptance thresholdConfidence 5

Strengths

+ The paper is well structured and organized. The method introduced in this paper is intuitive and straightforward to implement. The motivations behind the approach are vividly conveyed through clear formulations and effective visualizations. + The decomposition of SDS is both novel and intriguing. It not only offers a compelling interpretation of the large CFG weight selection in DreamFusion but also offers valuable insights into DDS [1] and VSD [2]. + The empirical results clearly demonstrat

Weaknesses

- While the explanation is intuitively presented, it remains somewhat challenging to discern the fundamental distinction from the negative prompt trick. - In Sec. 5, the paper asserts that NFSD is notably more efficient than VSD, despite sharing a similar working mechanism. Although this claim appears obvious, I would recommend providing quantitative evidence to substantiate this advantage when compared to other baseline methods. It is conceivable that dropping the noise term could even speed u

Reviewer 03Rating 8· accept, good paperConfidence 4

Strengths

This paper proposed a decomposition method to solve the problem of ambiguous results caused by the different distribution of the images generated by the generator and the original images; and uses this decomposition method to explain why previous methods have improved SDS. The experimental results are intuitive.

Weaknesses

I'm concerned about whether p_{neg} = “unrealistic, blurry, low quality, out of focus, ugly, low contrast, dull, dark, low-resolution, gloomy” is generalizable across situations and able to cancel out \delta_{N}. Would a better generator g(\theta) be able to achieve the same effect, or train a model to estimate the bias \delta_{N}?

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Music and Audio Processing · Model Reduction and Neural Networks

MethodsDiffusion