KRIS-Bench: Benchmarking Next-Level Intelligent Image Editing Models

Yongliang Wu; Zonghui Li; Xinting Hu; Xinyu Ye; Xianfang Zeng; Gang Yu; Wenbo Zhu; Bernt Schiele; Ming-Hsuan Yang; Xu Yang

arXiv:2505.16707·cs.CV·May 23, 2025

KRIS-Bench: Benchmarking Next-Level Intelligent Image Editing Models

Yongliang Wu, Zonghui Li, Xinting Hu, Xinyu Ye, Xianfang Zeng, Gang Yu, Wenbo Zhu, Bernt Schiele, Ming-Hsuan Yang, Xu Yang

PDF

Open Access

TL;DR

KRIS-Bench is a new benchmark for evaluating the reasoning capabilities of image editing models across factual, conceptual, and procedural knowledge, revealing significant gaps in current models' reasoning abilities.

Contribution

The paper introduces KRIS-Bench, a comprehensive, knowledge-based benchmark with 22 tasks and a novel evaluation protocol for assessing reasoning in image editing models.

Findings

01

Current models show significant reasoning gaps.

02

KRIS-Bench reveals limitations in knowledge-based editing.

03

Benchmark facilitates targeted improvements in model reasoning.

Abstract

Recent advances in multi-modal generative models have enabled significant progress in instruction-based image editing. However, while these models produce visually plausible outputs, their capacity for knowledge-based reasoning editing tasks remains under-explored. In this paper, we introduce KRIS-Bench (Knowledge-based Reasoning in Image-editing Systems Benchmark), a diagnostic benchmark designed to assess models through a cognitively informed lens. Drawing from educational theory, KRIS-Bench categorizes editing tasks across three foundational knowledge types: Factual, Conceptual, and Procedural. Based on this taxonomy, we design 22 representative tasks spanning 7 reasoning dimensions and release 1,267 high-quality annotated editing instances. To support fine-grained evaluation, we propose a comprehensive protocol that incorporates a novel Knowledge Plausibility metric, enhanced by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Visual Attention and Saliency Detection