Editing on the Generative Manifold: A Theoretical and Empirical Study of General Diffusion-Based Image Editing Trade-offs
Yi Hu, Leying Yi, Emily Davis, Finn Carter

TL;DR
This paper offers a unified theoretical and empirical framework for diffusion-based image editing, analyzing core usability trade-offs and providing bounds on editing deviations under various constraints.
Contribution
It formalizes diffusion editing as guided transport on a learned manifold, connecting diverse paradigms through a common theoretical lens and introducing task-agnostic metrics.
Findings
Derived bounds linking guidance strength and inversion error to deviations in non-target regions.
Analyzed the propagation of errors and effects of locality constraints under iterative edits.
Benchmarking of representative diffusion editing paradigms.
Abstract
Diffusion-based editing has rapidly evolved from curated inpainting tools into general-purpose editors spanning text-guided instruction following, mask-localized edits, drag-based geometric manipulation, exemplar transfer, and training-free composition systems. Despite strong empirical progress, the field lacks a unified treatment of core desiderata that govern practical usability: controllability (how precisely and continuously the user can specify an edit), faithfulness to user intent (semantic alignment to instructions), semantic consistency (preservation of identity and non-target content), locality (containment of changes), and perceptual quality (artifact suppression and detail retention). This paper provides a theoretical and empirical analysis of general diffusion-based image editing, connecting diverse paradigms through a common view of editing as guided transport on a learned…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
