IE-Critic-R1: Advancing the Explanatory Measurement of Text-Driven Image Editing for Human Perception Alignment
Bowen Qu, Shangkun Sun, Xiaoyu Liang, Wei Gao

TL;DR
This paper introduces a new benchmark and an explainable evaluation model for text-driven image editing, improving alignment with human perception and addressing limitations of previous assessment methods.
Contribution
The work presents IE-Bench, a comprehensive dataset, and IE-Critic-R1, a reinforcement learning-based metric that better correlates with human perception in image editing evaluation.
Findings
IE-Critic-R1 outperforms previous metrics in subjective alignment.
The benchmark includes nearly 4,000 samples with human scores.
The method provides more explainable quality assessments.
Abstract
Recent advances in text-driven image editing have been significant, yet the task of accurately evaluating these edited images continues to pose a considerable challenge. Different from the assessment of text-driven image generation, text-driven image editing is characterized by simultaneously conditioning on both text and a source image. The edited images often retain an intrinsic connection to the original image, which dynamically change with the semantics of the text. However, previous methods tend to solely focus on text-image alignment or have not well aligned with human perception. In this work, we introduce the Text-driven Image Editing Benchmark suite (IE-Bench) to enhance the assessment of text-driven edited images. IE-Bench includes a database contains diverse source images, various editing prompts and the corresponding edited results from different editing methods, and nearly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Digital Humanities and Scholarship · Multimodal Machine Learning Applications
