LoRA of Change: Learning to Generate LoRA for the Editing Instruction   from A Single Before-After Image Pair

Xue Song; Jiequan Cui; Hanwang Zhang; Jiaxin Shi; Jingjing Chen; Chi; Zhang; Yu-Gang Jiang

arXiv:2411.19156·cs.CV·December 10, 2024

LoRA of Change: Learning to Generate LoRA for the Editing Instruction from A Single Before-After Image Pair

Xue Song, Jiequan Cui, Hanwang Zhang, Jiaxin Shi, Jingjing Chen, Chi, Zhang, Yu-Gang Jiang

PDF

Open Access

TL;DR

This paper introduces LoC, a framework that learns to generate LoRA modules for image editing based on a single before-after image pair, improving interpretability and broadening application scope.

Contribution

It proposes a novel LoRA Reverse optimization technique allowing large-scale training with limited data, and demonstrates effective image editing with visual instructions.

Findings

01

High-quality image editing aligned with user intent

02

Effective learning from limited paired data

03

Broad applicability to real-world visual instructions

Abstract

In this paper, we propose the LoRA of Change (LoC) framework for image editing with visual instructions, i.e., before-after image pairs. Compared to the ambiguities, insufficient specificity, and diverse interpretations of natural language, visual instructions can accurately reflect users' intent. Building on the success of LoRA in text-based image editing and generation, we dynamically learn an instruction-specific LoRA to encode the "change" in a before-after image pair, enhancing the interpretability and reusability of our model. Furthermore, generalizable models for image editing with visual instructions typically require quad data, i.e., a before-after image pair, along with query and target images. Due to the scarcity of such quad data, existing models are limited to a narrow range of visual instructions. To overcome this limitation, we introduce the LoRA Reverse optimization…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications

MethodsALIGN