FlexEdit: Marrying Free-Shape Masks to VLLM for Flexible Image Editing
Tianshuo Yuan, Yuxiang Lin, Jue Wang, Zhi-Qi Cheng, Xiaolong Wang, Jiao GH, Wei Chen, Xiaojiang Peng

TL;DR
FlexEdit introduces a novel end-to-end image editing approach that combines free-shape masks with language instructions, utilizing a vision large language model and a mask enhancement adapter to improve editing accuracy and user-friendliness.
Contribution
We propose FlexEdit, which effectively integrates free-shape masks with language instructions using a VLLM and a new Mask Enhance Adapter for improved image editing performance.
Findings
Achieves state-of-the-art results on LLM-based image editing tasks.
Introduces FSMI-Edit benchmark with 8 free-shape mask types.
Demonstrates effectiveness of simple prompting techniques.
Abstract
Combining Vision Large Language Models (VLLMs) with diffusion models offers a powerful method for executing image editing tasks based on human language instructions. However, language instructions alone often fall short in accurately conveying user requirements, particularly when users want to add, replace elements in specific areas of an image. Luckily, masks can effectively indicate the exact locations or elements to be edited, while they require users to precisely draw the shapes at the desired locations, which is highly user-unfriendly. To address this, we propose FlexEdit, an end-to-end image editing method that leverages both free-shape masks and language instructions for Flexible Editing. Our approach employs a VLLM in comprehending the image content, mask, and user instructions. Additionally, we introduce the Mask Enhance Adapter (MEA) that fuses the embeddings of the VLLM with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputer Graphics and Visualization Techniques · Augmented Reality Applications
MethodsAdapter · Diffusion
