TL;DR
Follow-Your-Shape introduces a novel shape-aware image editing framework that enables precise, controllable shape modifications while preserving non-target regions, outperforming existing flow-based models especially in large-scale shape transformations.
Contribution
The paper presents a training-free, mask-free method utilizing Trajectory Divergence Maps and Scheduled KV Injection for improved shape editing, along with a new benchmark for evaluation.
Findings
Achieves superior shape editing fidelity and control.
Effectively handles large-scale shape transformations.
Outperforms existing flow-based models in visual quality.
Abstract
While recent flow-based image editing models demonstrate general-purpose capabilities across diverse tasks, they often struggle to specialize in challenging scenarios -- particularly those involving large-scale shape transformations. When performing such structural edits, these methods either fail to achieve the intended shape change or inadvertently alter non-target regions, resulting in degraded background quality. We propose Follow-Your-Shape, a training-free and mask-free framework that supports precise and controllable editing of object shapes while strictly preserving non-target content. Motivated by the divergence between inversion and editing trajectories, we compute a Trajectory Divergence Map (TDM) by comparing token-wise velocity differences between the inversion and denoising paths. The TDM enables precise localization of editable regions and guides a Scheduled KV Injection…
Peer Reviews
Decision·ICLR 2026 Poster
- The general insight of the paper to use magnitude of trajectory difference for localizing edits is both clever and intuitive. - The methods were generally well-motivated and clearly explained. - The paper is overall well-written and polished.
- Figure 2 is quite confusing. For example, is that row in the top right a legend for left and right sides of the figure? If so, it could be labelled more clearly. Also, for the bottom figure, the outline colors of the frames (particularly blue and orange) are very hard to notice. - The quantitative evaluation only includes results on the paper's custom evaluation dataset, but does not report results of a third-party editing dataset, such as PIE-Bench. Although PIE-Bench does not isolate shape c
- Trajectory-based region control. TDM offers a conceptually grounded way to localize editable regions by exploiting velocity differences in rectified-flow trajectories, avoiding reliance on noisy attention maps or external segmentation. - Training-free and mask-free pipeline. The method operates purely with a pre-trained FLUX model and does not require hand-drawn or model-generated masks, which reduces annotation overhead and simplifies deployment. - Consistent empirical gains. Across ReShap
- Global PSNR and LPIPS cannot disentangle background preservation from foreground edits, so the core claim of “preserving non-edited regions” is only indirectly tested. Some region-restricted metrics would provide stronger evidence. - The paper does not empirically compare TDM to simpler region-selection strategies (e.g. DiffEdit-style prediction differences, cross-attention masks) when used within the same staged KV injection scheme, leaving the unique benefit of TDM somewhat under-quantified
1. The paper identifies a significant and challenging problem. Large-scale shape editing is a major failure case for most SOTA methods. 2. Using divergence between source and target flow trajectories to infer editable regions is original and well-motivated. It moves beyond cross-attention saliency or explicit user masks toward a model-intrinsic notion of semantic locality. 3. The proposed approach can be applied to existing pre-trained flow models without finetuning or additional training data.
1. The proposed method employs ControlNet guidance (depth/canny maps) during the final editing stage to preserve structure and edges. However, none of the baselines (FlowEdit, RF-Solver, KV-Edit, MasaCtrl, DiT4Edit, etc.) use ControlNet or equivalent structural conditioning. As ControlNet introduces a strong external geometric prior, the resulting improvements in PSNR, LPIPS, and boundary fidelity cannot be attributed solely to the proposed TDM mechanism. As a result, the unfair bassline compari
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
