ReasonPix2Pix: Instruction Reasoning Dataset for Advanced Image Editing
Ying Jin, Pengyang Ling, Xiaoyi Dong, Pan Zhang, Jiaqi Wang, Dahua Lin

TL;DR
ReasonPix2Pix introduces a new dataset and approach to improve image editing models by enhancing their active reasoning capabilities for understanding and executing complex, implicit, and reasoning-based instructions.
Contribution
The paper presents ReasonPix2Pix, a reasoning-attentive instruction editing dataset that enhances models' ability to perform complex, reasoning-based image editing tasks.
Findings
Model fine-tuned on ReasonPix2Pix outperforms others in instructional editing.
Dataset includes reasoning instructions and realistic images with high variance.
Enhanced reasoning improves performance on implicit and explicit instructions.
Abstract
Instruction-based image editing focuses on equipping a generative model with the capacity to adhere to human-written instructions for editing images. Current approaches typically comprehend explicit and specific instructions. However, they often exhibit a deficiency in executing active reasoning capacities required to comprehend instructions that are implicit or insufficiently defined. To enhance active reasoning capabilities and impart intelligence to the editing model, we introduce ReasonPix2Pix, a comprehensive reasoning-attentive instruction editing dataset. The dataset is characterized by 1) reasoning instruction, 2) more realistic images from fine-grained categories, and 3) increased variances between input and edited images. When fine-tuned with our dataset under supervised conditions, the model demonstrates superior performance in instructional editing tasks, independent of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques
