SpotEdit: Evaluating Visually-Guided Image Editing Methods
Sara Ghazanfari, Wei-An Lin, Haitong Tian, Ersin Yumer

TL;DR
SpotEdit is a new benchmark for evaluating visually-guided image editing methods across various models, revealing significant performance gaps and issues like hallucination, to advance the development of more reliable editing techniques.
Contribution
The paper introduces SpotEdit, a comprehensive benchmark that systematically assesses different image editing models and highlights challenges like hallucination in visual-guided editing.
Findings
Significant performance disparities among models.
Leading models often hallucinate visual cues.
Benchmark exposes limitations in current editing methods.
Abstract
Visually-guided image editing, where edits are conditioned on both visual cues and textual prompts, has emerged as a powerful paradigm for fine-grained, controllable content generation. Although recent generative models have shown remarkable capabilities, existing evaluations remain simple and insufficiently representative of real-world editing challenges. We present SpotEdit, a comprehensive benchmark designed to systematically assess visually-guided image editing methods across diverse diffusion, autoregressive, and hybrid generative models, uncovering substantial performance disparities. To address a critical yet underexplored challenge, our benchmark includes a dedicated component on hallucination, highlighting how leading models, such as GPT-4o, often hallucinate the existence of a visual cue and erroneously perform the editing task. Our code and benchmark are publicly released at…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Imaging in Medicine
