FreeEdit: Mask-free Reference-based Image Editing with Multi-modal Instruction
Runze He, Kai Ma, Linjiang Huang, Shaofei Huang, Jialin Gao, Xiaoming, Wei, Jiao Dai, Jizhong Han, Si Liu

TL;DR
FreeEdit is a mask-free, reference-based image editing method that uses multi-modal instructions and a novel dataset to achieve high-quality, zero-shot editing guided by user language instructions.
Contribution
The paper introduces FreeEdit, a novel approach for reference-based image editing that eliminates manual masks and leverages a new dataset, FreeBench, for training and evaluation.
Findings
FreeEdit outperforms existing methods in zero-shot image editing quality.
The DRRA module effectively integrates reference details without disrupting self-attention.
FreeBench provides a high-quality dataset for reference-based image editing tasks.
Abstract
Introducing user-specified visual concepts in image editing is highly practical as these concepts convey the user's intent more precisely than text-based descriptions. We propose FreeEdit, a novel approach for achieving such reference-based image editing, which can accurately reproduce the visual concept from the reference image based on user-friendly language instructions. Our approach leverages the multi-modal instruction encoder to encode language instructions to guide the editing process. This implicit way of locating the editing area eliminates the need for manual editing masks. To enhance the reconstruction of reference details, we introduce the Decoupled Residual ReferAttention (DRRA) module. This module is designed to integrate fine-grained reference features extracted by a detail extractor into the image editing process in a residual way without interfering with the original…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications
