TL;DR
FlowAlign introduces a novel inversion-free flow-based image editing framework that enhances trajectory stability and source consistency by using terminal point regularization, enabling more controllable and reversible image edits.
Contribution
It proposes a new regularization method for flow-based image editing that improves trajectory stability and source consistency without requiring latent inversion.
Findings
Outperforms existing methods in source preservation.
Provides more stable and controllable editing trajectories.
Supports reversible image editing by reversing the ODE trajectory.
Abstract
Recent inversion-free, flow-based image editing methods such as FlowEdit leverages a pre-trained noise-to-image flow model such as Stable Diffusion 3, enabling text-driven manipulation by solving an ordinary differential equation (ODE). While the lack of exact latent inversion is a core advantage of these methods, it often results in unstable editing trajectories and poor source consistency. To address this limitation, we propose {\em FlowAlign}, a novel inversion-free flow-based framework for consistent image editing with optimal control-based trajectory control. Specifically, FlowAlign introduces source similarity at the terminal point as a regularization term to promote smoother and more consistent trajectories during the editing process. Notably, our terminal point regularization is shown to explicitly balance semantic alignment with the edit prompt and structural consistency with…
Peer Reviews
Decision·ICLR 2026 Poster
- The optimal-control formulation leads to a drift that explicitly combines semantic editing and source-consistency terms, offering a clear interpretation of how FlowAlign stabilizes trajectories. - Using a shared SD3 backbone and comparable NFEs, FlowAlign consistently improves background PSNR/SSIM/LPIPS/DINO over SDEdit, DDIB, RF-Inversion, and FlowEdit while maintaining competitive CLIP and HPS scores. - Reconstructing the original image by integrating backward yields much better fidelity w
- The optimal-control derivation relies on linear assumptions and a first-order approximation to the integrated drift, so the implemented ODE is not tightly characterized by theory and functions more as a guided heuristic. - The paper positions FlowAlign mainly against multi-step ODE/SDE editors and does not clearly situate it relative to recent fast or one-step editing approaches that optimize for speed–quality trade-offs. - Results on video and 3D Gaussian splatting are purely qualitative, w
- The method is training-free and inversion-free, which can be applied on various off-the-shelf flow-based models. - Compared to the baseline methods such as FlowEdit, the proposed method is more computationally efficient. - The paper is well-written and easy to follow.
- The motivation of this paper is not clear. Authors claim that 'these limitations of FlowEdit arise from non-smooth and unstable editing trajectories' (Line 86). However, there is no experimental analysis in the paper to support this, making this claim unpromising. - Authors claim that FlowEdit is sensitive to the hyperparameters. However, there are also many hyperparameters need to be tuned in the proposed method (including but not limited to the ω and ζ ). Could FlowAlign perform better than
1. Good results with clear qualitative comparisons – the paper presents convincing visual examples that highlight the differences between FlowAlign and baselines. The figures clearly show improved source structure preservation and accurate semantic edits. 2. Human preference study – beyond quantitative metrics, the authors conduct a user study showing that participants consistently prefer FlowAlign’s outputs over baselines. As far as I know, this is the first time human preference is reported fo
1. Trajectory regularization intuition – while trajectory regularization appears effective, the paper could provide a deeper explanation of why smoother trajectories improve editing outcomes. It seems intuitive that smoother trajectories help preserve the original image structure, but it is unclear what trade-offs or costs this introduces. For example, could this regularization limit the strength or flexibility of edits, or bias the model toward minimal changes? 2. Figure 2 clarity – figure 2
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsDiffusion
