TIE: Revolutionizing Text-based Image Editing for Complex-Prompt   Following and High-Fidelity Editing

Xinyu Zhang; Mengxue Kang; Fei Wei; Shuang Xu; Yuhe Liu; Lin Ma

arXiv:2405.16803·cs.CV·May 28, 2024

TIE: Revolutionizing Text-based Image Editing for Complex-Prompt Following and High-Fidelity Editing

Xinyu Zhang, Mengxue Kang, Fei Wei, Shuang Xu, Yuhe Liu, Lin Ma

PDF

Open Access

TL;DR

This paper introduces TIE, a novel image editing framework that leverages multimodal LLMs and diffusion models to interpret complex prompts and produce high-fidelity, consistent edited images, surpassing existing methods.

Contribution

The paper proposes a new framework combining Chain-of-Thought reasoning with diffusion models and a lightweight multimodal LLM to improve complex-prompt image editing.

Findings

01

Outperforms state-of-the-art models in image editing tasks.

02

Enhances understanding of complex prompts for more accurate edits.

03

Maintains high image fidelity and consistency before and after editing.

Abstract

As the field of image generation rapidly advances, traditional diffusion models and those integrated with multimodal large language models (LLMs) still encounter limitations in interpreting complex prompts and preserving image consistency pre and post-editing. To tackle these challenges, we present an innovative image editing framework that employs the robust Chain-of-Thought (CoT) reasoning and localizing capabilities of multimodal LLMs to aid diffusion models in generating more refined images. We first meticulously design a CoT process comprising instruction decomposition, region localization, and detailed description. Subsequently, we fine-tune the LISA model, a lightweight multimodal LLM, using the CoT process of Multimodal LLMs and the mask of the edited image. By providing the diffusion models with knowledge of the generated prompt and image mask, our models generate images with a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdditive Manufacturing and 3D Printing Technologies · Modular Robots and Swarm Intelligence

MethodsDiffusion