Image Inpainting Models are Effective Tools for Instruction-guided Image Editing
Xuan Ju, Junhao Zhuang, Zhaoyang Zhang, Yuxuan Bian, Qiang Xu, Ying, Shan

TL;DR
This paper presents a simple yet effective approach for instruction-guided image editing using image inpainting models, achieving high success rates without joint training of language and image models.
Contribution
The authors demonstrate that connecting language models and image inpainting models via intermediary guidance outperforms joint training methods for image editing.
Findings
High success rate in instruction-guided image editing
Effective use of inpainting models with language guidance
Outperforms joint training approaches
Abstract
This is the technique report for the winning solution of the CVPR2024 GenAI Media Generation Challenge Workshop's Instruction-guided Image Editing track. Instruction-guided image editing has been largely studied in recent years. The most advanced methods, such as SmartEdit and MGIE, usually combine large language models with diffusion models through joint training, where the former provides text understanding ability, and the latter provides image generation ability. However, in our experiments, we find that simply connecting large language models and image generation models through intermediary guidance such as masks instead of joint fine-tuning leads to a better editing performance and success rate. We use a 4-step process IIIE (Inpainting-based Instruction-guided Image Editing): editing category classification, main editing object identification, editing mask acquisition, and image…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · AI in cancer detection · Advanced Image and Video Retrieval Techniques
MethodsDiffusion · Inpainting
