RealDrag: The First Dragging Benchmark with Real Target Image

Ahmad Zafarani; Zahra Dehghanian; Mohammadreza Davoodi; Mohsen Shadroo; MohammadAmin Fazli; Hamid R. Rabiee

arXiv:2512.12287·cs.CV·December 16, 2025

RealDrag: The First Dragging Benchmark with Real Target Image

Ahmad Zafarani, Zahra Dehghanian, Mohammadreza Davoodi, Mohsen Shadroo, MohammadAmin Fazli, Hamid R. Rabiee

PDF

Open Access

TL;DR

RealDrag introduces the first standardized benchmark dataset with paired ground truth images and novel metrics for evaluating point-based image editing models, enabling fairer and more consistent comparisons.

Contribution

It provides a comprehensive dataset with ground truth images, diverse samples, and four new evaluation metrics for point-based image editing.

Findings

01

Evaluated 17 state-of-the-art models systematically.

02

Identified trade-offs among current approaches.

03

Established a reproducible baseline for future research.

Abstract

The evaluation of drag based image editing models is unreliable due to a lack of standardized benchmarks and metrics. This ambiguity stems from inconsistent evaluation protocols and, critically, the absence of datasets containing ground truth target images, making objective comparisons between competing methods difficult. To address this, we introduce \textbf{RealDrag}, the first comprehensive benchmark for point based image editing that includes paired ground truth target images. Our dataset contains over 400 human annotated samples from diverse video sources, providing source/target images, handle/target points, editable region masks, and descriptive captions for both the image and the editing action. We also propose four novel, task specific metrics: Semantical Distance (SeD), Outer Mask Preserving Score (OMPS), Inner Patch Preserving Score (IPPS), and Directional Similarity (DiS).…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Visual Attention and Saliency Detection · Image and Video Quality Assessment