Multi-turn Consistent Image Editing
Zijun Zhou, Yingying Deng, Xiangyu He, Weiming Dong, Fan Tang

TL;DR
This paper introduces a multi-turn image editing framework that allows iterative refinements, improving consistency, success rates, and visual fidelity in complex editing tasks compared to existing single-step methods.
Contribution
It presents a novel multi-turn editing approach using flow matching, LQR for stable sampling, and adaptive attention highlighting to enhance editability and coherence.
Findings
Significantly higher edit success rates
Improved visual fidelity in edited images
Enhanced multi-turn coherence and stability
Abstract
Many real-world applications, such as interactive photo retouching, artistic content creation, and product design, require flexible and iterative image editing. However, existing image editing methods primarily focus on achieving the desired modifications in a single step, which often struggles with ambiguous user intent, complex transformations, or the need for progressive refinements. As a result, these methods frequently produce inconsistent outcomes or fail to meet user expectations. To address these challenges, we propose a multi-turn image editing framework that enables users to iteratively refine their edits, progressively achieving more satisfactory results. Our approach leverages flow matching for accurate image inversion and a dual-objective Linear Quadratic Regulators (LQR) for stable sampling, effectively mitigating error accumulation. Additionally, by analyzing the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Visual Attention and Saliency Detection · Advanced Image and Video Retrieval Techniques
MethodsSoftmax · Attention Is All You Need · Focus
