Bespoke Solvers for Generative Flow Models

Neta Shaul; Juan Perez; Ricky T. Q. Chen; Ali Thabet; Albert Pumarola,; Yaron Lipman

arXiv:2310.19075·cs.LG·October 31, 2023·2 cites

Bespoke Solvers for Generative Flow Models

Neta Shaul, Juan Perez, Ricky T. Q. Chen, Ali Thabet, Albert Pumarola,, Yaron Lipman

PDF

Open Access 3 Reviews

TL;DR

This paper introduces Bespoke solvers, a new framework for creating custom ODE solvers tailored to pre-trained flow models, significantly improving sampling efficiency and quality with minimal training overhead.

Contribution

It presents a novel, efficient method to construct tailored ODE solvers that outperform existing dedicated solvers in generative flow models.

Findings

01

Bespoke solvers achieve lower FID scores with fewer NFE.

02

Training only 1% of pre-training GPU time, they improve sampling quality.

03

Effective on CIFAR10 and ImageNet-64 datasets.

Abstract

Diffusion or flow-based models are powerful generative paradigms that are notoriously hard to sample as samples are defined as solutions to high-dimensional Ordinary or Stochastic Differential Equations (ODEs/SDEs) which require a large Number of Function Evaluations (NFE) to approximate well. Existing methods to alleviate the costly sampling process include model distillation and designing dedicated ODE solvers. However, distillation is costly to train and sometimes can deteriorate quality, while dedicated solvers still require relatively large NFE to produce high quality samples. In this paper we introduce "Bespoke solvers", a novel framework for constructing custom ODE solvers tailored to the ODE of a given pre-trained flow model. Our approach optimizes an order consistent and parameter-efficient solver (e.g., with 80 learnable parameters), is trained for roughly 1% of the GPU time…

Peer Reviews

Decision·ICLR 2024 spotlight

Reviewer 01Rating 6· marginally above the acceptance thresholdConfidence 4

Strengths

- The problem of accelerating inference in Diffusion models is important - The structure of the paper is good, and the paper is generally well written (I am very happy that the authors opted to include Figure 2, and Algorithm boxes 2&3 in the main paper which help to understand the method) - The idea of learning parameters for a reparameterization of a Neural ODE is original (to the best of my knowledge) - The authors cover a lot of previous work (however I think some works are misrepresented, s

Weaknesses

**Missing details**: I think some details of the training are missing. How many GT trajectories are used and is the time for computing GT trajectories accounted for when claiming that the method only needs "roughly 1% of the GPU time" compared to training of the Diffusion model? This seems like a very important detail. **Single guidance value training**: As far as I understand, the authors need to retrain fresh parameters for each guidance value (and compute GT trajectories for each guidance va

Reviewer 02Rating 8· accept, good paperConfidence 3

Strengths

- Clarity: the proposed method is elegant and it is well-explained in the paper. The derivations are correct to the best of my knowledge. - Highly practical and effective: the method is also cheap to train, adaptable to many existing architectures, and it leads to significant benefits for simulating flow-based image samplers. The experiments in Tables 1 and 2 are especially convincing

Weaknesses

- No tunability for NFE: one minor weakness of this approach is that training the bespoke solver requires choosing up-front the number of function evaluations to be used in sampling. In contrast, non-instance dependent schemes can adjust to different NFE budgets at sample time, or they can be run 'until convergence' (choosing NFE adaptively for each particle trajectory).

Reviewer 03Rating 8· accept, good paperConfidence 4

Strengths

(1) This method solves the important problem in the diffusion/flow models, the time-step and input scaling problem by learning hyperpamareters of the pretrained model ubiquitously, by optimizing the time steps and the input scaling with just some data-driven optimization with the pretrained model. If trained properly, and the bound between RMSE loss and the global truncation loss is rightly narrowed, then sampling from this learning-base parameterization derives good sampling results, as the pap

Weaknesses

(1) Even though the RMSE objective is upper bounded by the Lipschitz loss, there can be some gap between these two loss; unless the loss converges to zero. (2) There results are not yet validated with larger-scale (than size $64\times 64$) datasets, like FFHQ or LSUN.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Machine Learning in Healthcare · Explainable Artificial Intelligence (XAI)