Text-to-Image Rectified Flow as Plug-and-Play Priors

Xiaofeng Yang; Cheng Chen; Xulei Yang; Fayao Liu; Guosheng Lin

arXiv:2406.03293·cs.CV·February 21, 2025

Text-to-Image Rectified Flow as Plug-and-Play Priors

Xiaofeng Yang, Cheng Chen, Xulei Yang, Fayao Liu, Guosheng Lin

PDF

Open Access 4 Repos 3 Reviews

TL;DR

This paper introduces rectified flow models as efficient and high-quality priors for generative tasks, demonstrating their advantages over diffusion models in text-to-3D generation, image inversion, and editing.

Contribution

It presents the first theoretical and experimental validation of rectified flow as a versatile prior, outperforming diffusion models in quality and efficiency.

Findings

01

Rectified flow priors outperform diffusion-based priors in text-to-3D generation.

02

The method enables effective image inversion and editing.

03

Fewer inference steps are needed compared to diffusion models.

Abstract

Large-scale diffusion models have achieved remarkable performance in generative tasks. Beyond their initial training applications, these models have proven their ability to function as versatile plug-and-play priors. For instance, 2D diffusion models can serve as loss functions to optimize 3D implicit models. Rectified flow, a novel class of generative models, enforces a linear progression from the source to the target distribution and has demonstrated superior performance across various domains. Compared to diffusion-based methods, rectified flow approaches surpass in terms of generation quality and efficiency, requiring fewer inference steps. In this work, we present theoretical and experimental evidence demonstrating that rectified flow based methods offer similar functionalities to diffusion models - they can also serve as effective priors. Besides the generative capabilities of…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 6Confidence 4

Strengths

- The paper proposed the first algorithm that utilizes rectified flow models as priors, to both enable implicit information encoded in the rectified flow model and inversion based image editing with such models. - In addition to the baseline method provided, authors also propose an extension named RFDS-Rev, that improves over the baseline objective RFDS, that combines the algorithms proposed together and promises improved generation quality. - Proposed method showcases satisfactory results on Te

Weaknesses

- In the examples provided, there is a significant saturation effect on the provided results (see Fig. 5, row 2 and Fig. 6, examples from SD3). It is unclear if that effect is a result of the proposed method or a property of the rectified flow models. - While the image editing results seem semantically correct, there seems to be significant changes in the provided images (See Fig. 5) compared to methods such as Null-text Inversion. Despite the fact that the authors provide a user study and CLIP

Reviewer 02Rating 6Confidence 5

Strengths

1. This paper tackles a long-standing problem: SDS. SDS with rectified flow is not good enough and the proposed method generates sharp results. 2. The proposed method is easy to understand and mostly sound. 3. The proposed method is generalizable to a wide range of flow-based methods. 4. Preliminary is thorough enough to provide the knowledge base. 5. Figure 2 greatly helps understanding the intuition of RFDS-Rev. 6. Experiments are well-organized from 2D to 3D.

Weaknesses

(Ordered by importance. Resolving them will raise my rating.) 1. Subsection 3.2 should provide the theoretical justification for the reason why optimizing the noise helps RFDS. 2. Choice of the competitors for text-based image editing is not sound because it covers only inversion variants. Answering following questions may improve soundness: Why should we compare only with inversion variants? Why prompt-to-prompt variants (e.g., DDPM inversion + P2P) should be ignored? 3. The text-to-3D results

Reviewer 03Rating 5Confidence 3

Strengths

1. The paper analyzes the refined process of the rectified flow. 2. Using the rectified flow as the priors is interesting. 3. The experiments are sufficient.

Weaknesses

* Writing needed to be improved, especially, from Lines 100-107, which is important but the logic is somewhat unclear. * The focus of the paper is a little confusing, including Image inversion, editing, and text-to-3D generation. In my view, text-to-3d must be the key contribution as it use the 2D model as the priors. * What is the difference between RFDS Loss and SDS loss. It seems that RFDS is the version of the flow-based model. * What is the speed impact of the iterative application of iR

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBusiness Process Modeling and Analysis

MethodsDiffusion