SpotEdit: Evaluating Visually-Guided Image Editing Methods

Sara Ghazanfari; Wei-An Lin; Haitong Tian; Ersin Yumer

arXiv:2508.18159·cs.CV·September 30, 2025

SpotEdit: Evaluating Visually-Guided Image Editing Methods

Sara Ghazanfari, Wei-An Lin, Haitong Tian, Ersin Yumer

PDF

Open Access 1 Datasets

TL;DR

SpotEdit is a new benchmark for evaluating visually-guided image editing methods across various models, revealing significant performance gaps and issues like hallucination, to advance the development of more reliable editing techniques.

Contribution

The paper introduces SpotEdit, a comprehensive benchmark that systematically assesses different image editing models and highlights challenges like hallucination in visual-guided editing.

Findings

01

Significant performance disparities among models.

02

Leading models often hallucinate visual cues.

03

Benchmark exposes limitations in current editing methods.

Abstract

Visually-guided image editing, where edits are conditioned on both visual cues and textual prompts, has emerged as a powerful paradigm for fine-grained, controllable content generation. Although recent generative models have shown remarkable capabilities, existing evaluations remain simple and insufficiently representative of real-world editing challenges. We present SpotEdit, a comprehensive benchmark designed to systematically assess visually-guided image editing methods across diverse diffusion, autoregressive, and hybrid generative models, uncovering substantial performance disparities. To address a critical yet underexplored challenge, our benchmark includes a dedicated component on hallucination, highlighting how leading models, such as GPT-4o, often hallucinate the existence of a visual cue and erroneously perform the editing task. Our code and benchmark are publicly released at…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

saraghznfri/SpotEditBench
dataset· 20 dl
20 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Imaging in Medicine