WorldEdit: Towards Open-World Image Editing with a Knowledge-Informed Benchmark
Wang Lin, Feng Wang, Majun Zhang, Wentao Hu, Tao Jin, Zhou Zhao, Fei Wu, Jingyuan Chen, Alan Yuille, Sucheng Ren

TL;DR
This paper introduces WorldEdit, a new dataset and evaluation framework for open-world image editing that emphasizes causal reasoning and real-world knowledge, addressing limitations of existing models in implicit instruction understanding.
Contribution
The paper presents WorldEdit, a novel dataset and benchmark designed to improve implicit, causal, and knowledge-driven image editing capabilities in models.
Findings
WorldEdit enables models to better handle implicit and causal instructions.
Fine-tuning with WorldEdit improves performance on causal editing scenarios.
Models trained with WorldEdit outperform existing systems in instruction following and knowledge plausibility.
Abstract
Recent advances in image editing models have demonstrated remarkable capabilities in executing explicit instructions, such as attribute manipulation, style transfer, and pose synthesis. However, these models often face challenges when dealing with implicit editing instructions, which describe the cause of a visual change without explicitly detailing the resulting outcome. These limitations arise because existing models rely on uniform editing strategies that are not equipped to handle the complex world knowledge and reasoning required for implicit instructions. To address this gap, we introduce \textbf{WorldEdit}, a dataset specifically designed to enable world-driven image editing. WorldEdit consists of high-quality editing samples, guided by paraphrased instructions that align with real-world causal logic. Furthermore, we provide \textbf{WorldEdit-Test} for evaluating the existing…
Peer Reviews
Decision·ICLR 2026 Poster
### originality 1. The paper primarily constructs a dataset designed to elicit existing models' capabilities for world knowledge understanding and generation. 2. It employs supervised fine-tuning (SFT) and reinforcement learning with multiple reward signals to train models, thereby validating the effectiveness of the proposed dataset. ### quality 1. The data construction pipeline is rigorous and reliable. The collection, rewriting, construction, and generation of instructions undergo mature fil
1. From a visual inspection perspective, the color tone appears to inherit characteristics from GPT-4o. For instance, in Fig. 8, the pizza and instant noodles exhibit noticeably intensified color saturation. The model may learn to mimic GPT-4o's output distribution, raising concerns about the actual physical accuracy. 2. A discussion of how dataset scale affects model performance and generalization would add significant value to the work. 3. The paper lacks object-level metrics to verify wheth
1. The problem of implicit, world-knowledge image editing is a timely research problem and opens new possibilities for image editing models. 2. The proposed dataset design contains comprehensive editing categories (Time, Temperature, Humidity, Acidity, Light, Break, Inflate, Squeeze, Twist, Stretch).
1. Both the dataset construction pipeline and the benchmark evaluation pipeline rely on proprietary models: Data synthesis and filtering are based on GPT-4o, and model evaluation is based on Qwen-VL-Max. This makes the method difficult to scale up, especially when constructing training datasets, where all the training samples are generated from GPT-4o. 2. Following weakness 1, since the evaluation pipeline is based on Qwen-VL-Max, there could be potential bias from the model during evaluation. A
- The paper presents a novel world knowledge editing method and demonstrates state-of-the-art results with convincing showcased outcomes. - The authors provide a new dataset along with a corresponding test set, and employ Flow-GRPO to train the Bagel model, reflecting substantial research effort. - The paper is logically structured and clearly written, with effective comparative results.
See the "Question" part.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Machine Learning in Materials Science
