InsightEdit: Towards Better Instruction Following for Image Editing
Yingjing Xu, Jie Kong, Jiazhi Wang, Xiao Pan, Bo Lin, Qiang Liu

TL;DR
InsightEdit introduces a novel dataset and a multimodal approach for instruction-based image editing, significantly improving complex instruction adherence and background consistency over previous methods.
Contribution
The paper presents a large-scale high-quality dataset and a two-stream multimodal model that better utilize image and text information for improved image editing.
Findings
Achieves state-of-the-art results in complex instruction following.
Maintains high background consistency in edited images.
Outperforms previous methods in visual quality and instruction adherence.
Abstract
In this paper, we focus on the task of instruction-based image editing. Previous works like InstructPix2Pix, InstructDiffusion, and SmartEdit have explored end-to-end editing. However, two limitations still remain: First, existing datasets suffer from low resolution, poor background consistency, and overly simplistic instructions. Second, current approaches mainly condition on the text while the rich image information is underexplored, therefore inferior in complex instruction following and maintaining background consistency. Targeting these issues, we first curated the AdvancedEdit dataset using a novel data construction pipeline, formulating a large-scale dataset with high visual quality, complex instructions, and good background consistency. Then, to further inject the rich image information, we introduce a two-stream bridging mechanism utilizing both the textual and visual features…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Open Education and E-Learning · Image Retrieval and Classification Techniques
MethodsFocus
