Preserve or Modify? Context-Aware Evaluation for Balancing Preservation   and Modification in Text-Guided Image Editing

Yoonjeon Kim; Soohyun Ryu; Yeonsung Jung; Hyunkoo Lee; Joowon Kim,; June Yong Yang; Jaeryong Hwang; Eunho Yang

arXiv:2410.11374·cs.CV·March 21, 2025

Preserve or Modify? Context-Aware Evaluation for Balancing Preservation and Modification in Text-Guided Image Editing

Yoonjeon Kim, Soohyun Ryu, Yeonsung Jung, Hyunkoo Lee, Joowon Kim,, June Yong Yang, Jaeryong Hwang, Eunho Yang

PDF

Open Access

TL;DR

AugCLIP is a novel context-aware evaluation metric for text-guided image editing that adaptively balances preservation and modification by modeling ideal edits in CLIP space, aligning closely with human judgment.

Contribution

This paper introduces AugCLIP, the first adaptive, context-aware metric that effectively balances preservation and modification in image editing evaluation, outperforming existing metrics.

Findings

01

AugCLIP correlates strongly with human evaluations across five datasets.

02

It outperforms existing metrics in diverse editing scenarios.

03

AugCLIP effectively models ideal edits in CLIP space.

Abstract

The development of vision-language and generative models has significantly advanced text-guided image editing, which seeks the preservation of core elements in the source image while implementing modifications based on the target text. However, existing metrics have a context-blindness problem, indiscriminately applying the same evaluation criteria on completely different pairs of source image and target text, biasing towards either modification or preservation. Directional CLIP similarity, the only metric that considers both source image and target text, is also biased towards modification aspects and attends to irrelevant editing regions of the image. We propose AugCLIP, a context-aware metric that adaptively coordinates preservation and modification aspects, depending on the specific context of a given source image and target text. This is done by deriving the CLIP representation of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputer Graphics and Visualization Techniques

MethodsALIGN · Contrastive Language-Image Pre-training