Multi-turn Consistent Image Editing

Zijun Zhou; Yingying Deng; Xiangyu He; Weiming Dong; Fan Tang

arXiv:2505.04320·cs.CV·May 8, 2025

Multi-turn Consistent Image Editing

Zijun Zhou, Yingying Deng, Xiangyu He, Weiming Dong, Fan Tang

PDF

Open Access

TL;DR

This paper introduces a multi-turn image editing framework that allows iterative refinements, improving consistency, success rates, and visual fidelity in complex editing tasks compared to existing single-step methods.

Contribution

It presents a novel multi-turn editing approach using flow matching, LQR for stable sampling, and adaptive attention highlighting to enhance editability and coherence.

Findings

01

Significantly higher edit success rates

02

Improved visual fidelity in edited images

03

Enhanced multi-turn coherence and stability

Abstract

Many real-world applications, such as interactive photo retouching, artistic content creation, and product design, require flexible and iterative image editing. However, existing image editing methods primarily focus on achieving the desired modifications in a single step, which often struggles with ambiguous user intent, complex transformations, or the need for progressive refinements. As a result, these methods frequently produce inconsistent outcomes or fail to meet user expectations. To address these challenges, we propose a multi-turn image editing framework that enables users to iteratively refine their edits, progressively achieving more satisfactory results. Our approach leverages flow matching for accurate image inversion and a dual-objective Linear Quadratic Regulators (LQR) for stable sampling, effectively mitigating error accumulation. Additionally, by analyzing the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Visual Attention and Saliency Detection · Advanced Image and Video Retrieval Techniques

MethodsSoftmax · Attention Is All You Need · Focus