TL;DR
FlowOpt introduces a gradient-free optimization method that efficiently controls diffusion-based image generation processes end-to-end, enabling high-quality editing and inversion without backpropagation.
Contribution
It presents a novel zero-order optimization framework that treats the entire diffusion process as a black box, allowing for efficient, training-free image editing and inversion.
Findings
Achieves state-of-the-art results in image editing tasks.
Requires similar neural function evaluations as existing methods.
Guarantees convergence under certain step-size conditions.
Abstract
The remarkable success of diffusion and flow-matching models has ignited a surge of works on adapting them at test time for controlled generation tasks. Examples range from image editing to restoration, compression and personalization. However, due to the iterative nature of the sampling process in those models, it is computationally impractical to use gradient-based optimization to directly control the image generated at the end of the process. As a result, existing methods typically resort to manipulating each timestep separately. Here we introduce FlowOpt - a zero-order (gradient-free) optimization framework that treats the entire flow process as a black box, enabling optimization through the whole sampling path without backpropagation through the model. Our method is both highly efficient and allows users to monitor the intermediate optimization results and perform early stopping if…
Peer Reviews
Decision·Submitted to ICLR 2026
* Theorem 1 provides a sufficient condition on the step size under which the FlowOpt iterations provably converge. This formal analysis of convergence is a valuable addition to flow-based optimization literature, where most prior methods rely on heuristic step-size tuning.
The novelty is limited. The proposed zero-order optimization across the full flow process is conceptually identical to FlowChef [1] (ICCV 2025, arXiv Dec 2024), which already introduced a gradient-free control framework with theoretical guarantees and broad task coverage (inversion, editing, and restoration). The main difference, introducing a step-size bound, is a modest theoretical insight rather than something novel or different. The work lacks comprehensive evaluation on community-standard
The paper is very well written. 1. This paper presents a clean idea of optimising the whole process rather than per-timestep manipulation. 2. The paper also presents a theoretical contribution: i.e., a sufficient condition on the step-size for convergence of the opimizer in this setting. 3. The edits looks visually appealing and demonstrate a good tradeoff between fidelity and edit strength.
1. Although the paper compares methods quantitatively and qualitatively, a user study is missing. 2. Paper doesn't really discuss how the Zero-order method performs with increase/decrease in dimension since zero-order methods may suffer from bad convergence with increase in dimension.
- The method is novel, and it is initially surprising that it works. The authors provide an analysis and theoretical justification (but I do have concerns regarding the theoretical part, see weaknesses section). - The method itself is simple, and the paper presentation is clear. - The authors performed extensive evaluations against competing methods and the results are plausible (but I do have concerns here, see weaknesses section). - The limitations of the method are clearly discussed in the Ap
### Major Concerns 1. The method requires a relatively large number of NFEs in order to provide an advantage over existing methods (e.g., FireFlow and UniInv) in reconstruction. 2. The authors present a theorem that guarantees the method's convergence under certain assumptions, however why and if these assumptions hold in practice is not clear. In addition, I think that the proof itself in Appendix F is potentially flawed, as explained next. Even assuming the condition holds, for the proof to
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
