Towards Efficient Exemplar Based Image Editing with Multimodal VLMs

Avadhoot Jadhav; Ashutosh Srivastava; Abhinav Java; Silky Singh; Tarun Ram Menta; Surgan Jandial; Balaji Krishnamurthy

arXiv:2506.20155·cs.CV·June 26, 2025

Towards Efficient Exemplar Based Image Editing with Multimodal VLMs

Avadhoot Jadhav, Ashutosh Srivastava, Abhinav Java, Silky Singh, Tarun Ram Menta, Surgan Jandial, Balaji Krishnamurthy

PDF

Open Access

TL;DR

This paper introduces a fast, optimization-free method for exemplar-based image editing that leverages pretrained text-to-image diffusion models and multimodal vision-language models, outperforming baselines in efficiency and effectiveness.

Contribution

The work presents a novel, end-to-end pipeline for exemplar-based image editing that is faster and more effective than existing methods, without requiring optimization.

Findings

01

Outperforms baseline methods on various edit types

02

Operates approximately 4 times faster than previous approaches

03

Demonstrates effectiveness of multimodal VLMs in image editing

Abstract

Text-to-Image Diffusion models have enabled a wide array of image editing applications. However, capturing all types of edits through text alone can be challenging and cumbersome. The ambiguous nature of certain image edits is better expressed through an exemplar pair, i.e., a pair of images depicting an image before and after an edit respectively. In this work, we tackle exemplar-based image editing -- the task of transferring an edit from an exemplar pair to a content image(s), by leveraging pretrained text-to-image diffusion models and multimodal VLMs. Even though our end-to-end pipeline is optimization-free, our experiments demonstrate that it still outperforms baselines on multiple types of edits while being ~4x faster.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques

MethodsDiffusion