TIE: Revolutionizing Text-based Image Editing for Complex-Prompt Following and High-Fidelity Editing
Xinyu Zhang, Mengxue Kang, Fei Wei, Shuang Xu, Yuhe Liu, Lin Ma

TL;DR
This paper introduces TIE, a novel image editing framework that leverages multimodal LLMs and diffusion models to interpret complex prompts and produce high-fidelity, consistent edited images, surpassing existing methods.
Contribution
The paper proposes a new framework combining Chain-of-Thought reasoning with diffusion models and a lightweight multimodal LLM to improve complex-prompt image editing.
Findings
Outperforms state-of-the-art models in image editing tasks.
Enhances understanding of complex prompts for more accurate edits.
Maintains high image fidelity and consistency before and after editing.
Abstract
As the field of image generation rapidly advances, traditional diffusion models and those integrated with multimodal large language models (LLMs) still encounter limitations in interpreting complex prompts and preserving image consistency pre and post-editing. To tackle these challenges, we present an innovative image editing framework that employs the robust Chain-of-Thought (CoT) reasoning and localizing capabilities of multimodal LLMs to aid diffusion models in generating more refined images. We first meticulously design a CoT process comprising instruction decomposition, region localization, and detailed description. Subsequently, we fine-tune the LISA model, a lightweight multimodal LLM, using the CoT process of Multimodal LLMs and the mask of the edited image. By providing the diffusion models with knowledge of the generated prompt and image mask, our models generate images with a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdditive Manufacturing and 3D Printing Technologies · Modular Robots and Swarm Intelligence
MethodsDiffusion
