FreeEdit: Mask-free Reference-based Image Editing with Multi-modal   Instruction

Runze He; Kai Ma; Linjiang Huang; Shaofei Huang; Jialin Gao; Xiaoming; Wei; Jiao Dai; Jizhong Han; Si Liu

arXiv:2409.18071·cs.CV·September 27, 2024

FreeEdit: Mask-free Reference-based Image Editing with Multi-modal Instruction

Runze He, Kai Ma, Linjiang Huang, Shaofei Huang, Jialin Gao, Xiaoming, Wei, Jiao Dai, Jizhong Han, Si Liu

PDF

Open Access

TL;DR

FreeEdit is a mask-free, reference-based image editing method that uses multi-modal instructions and a novel dataset to achieve high-quality, zero-shot editing guided by user language instructions.

Contribution

The paper introduces FreeEdit, a novel approach for reference-based image editing that eliminates manual masks and leverages a new dataset, FreeBench, for training and evaluation.

Findings

01

FreeEdit outperforms existing methods in zero-shot image editing quality.

02

The DRRA module effectively integrates reference details without disrupting self-attention.

03

FreeBench provides a high-quality dataset for reference-based image editing tasks.

Abstract

Introducing user-specified visual concepts in image editing is highly practical as these concepts convey the user's intent more precisely than text-based descriptions. We propose FreeEdit, a novel approach for achieving such reference-based image editing, which can accurately reproduce the visual concept from the reference image based on user-friendly language instructions. Our approach leverages the multi-modal instruction encoder to encode language instructions to guide the editing process. This implicit way of locating the editing area eliminates the need for manual editing masks. To enhance the reconstruction of reference details, we introduce the Decoupled Residual ReferAttention (DRRA) module. This module is designed to integrate fine-grained reference features extracted by a detail extractor into the image editing process in a residual way without interfering with the original…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications