HQ-Edit: A High-Quality Dataset for Instruction-based Image Editing

Mude Hui; Siwei Yang; Bingchen Zhao; Yichun Shi; Heng Wang; Peng Wang,; Yuyin Zhou; Cihang Xie

arXiv:2404.09990·cs.CV·April 16, 2024·1 cites

HQ-Edit: A High-Quality Dataset for Instruction-based Image Editing

Mude Hui, Siwei Yang, Bingchen Zhao, Yichun Shi, Heng Wang, Peng Wang,, Yuyin Zhou, Cihang Xie

PDF

Open Access 2 Datasets 3 Reviews

TL;DR

HQ-Edit is a large, high-quality dataset for instruction-based image editing created using advanced foundation models, significantly improving model performance and surpassing human-annotated data in quality.

Contribution

The paper presents a scalable pipeline for creating high-quality image editing datasets using GPT-4V and DALL-E 3, and introduces new evaluation metrics for assessing edit quality.

Findings

01

HQ-Edit dataset contains 200,000 high-quality edits.

02

Finetuning InstructPix2Pix on HQ-Edit achieves state-of-the-art results.

03

Proposed metrics effectively evaluate image editing quality.

Abstract

This study introduces HQ-Edit, a high-quality instruction-based image editing dataset with around 200,000 edits. Unlike prior approaches relying on attribute guidance or human feedback on building datasets, we devise a scalable data collection pipeline leveraging advanced foundation models, namely GPT-4V and DALL-E 3. To ensure its high quality, diverse examples are first collected online, expanded, and then used to create high-quality diptychs featuring input and output images with detailed text prompts, followed by precise alignment ensured through post-processing. In addition, we propose two evaluation metrics, Alignment and Coherence, to quantitatively assess the quality of image edit pairs using GPT-4V. HQ-Edits high-resolution images, rich in detail and accompanied by comprehensive editing prompts, substantially enhance the capabilities of existing image editing models. For…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 8Confidence 4

Strengths

1. Compared with previous work, the proposed dataset is high quality regarding to image resolution, content/edit-type diversity and prompt-image alignment. 2. Baseline method InstructPix2Pix finetuned on the proposed HQ-Edit achieves state-of-the-art performance. 3. The proposed data curation pipeline is scalable by leveraging pretrained generative model (e.g. DALL·E3) and visual-language model (e.g. GPT-4 / GPT-4V). 4. The paper is well-organized and easy to follow.

Weaknesses

1. Compared with previous work, e.g. MagicBrush, the source images from HQ-Edit are generated by DALLE-3, which may introduce distribution bias between AIGC and photo realistic contents. 2. The necessity/importance analysis of using diptych generation is missing. 3. The proposed two metrics Alignment and Coherence are mainly used in the main evaluation. Given the validated limitation of CLIP directional similarity, other commonly used metrics are missing for quantitative evaluation, which may le

Reviewer 02Rating 6Confidence 4

Strengths

1. The proposed HQ-Edit can be a training data for instruction-based image editing task, which can promote the development of this area. 2. The performance of finetuned InstructPix2Pix has proven the effectiveness of HQ-Edit. 3. The proposed evaluation metrics are superior to the CLIP score.

Weaknesses

1. It seems that HQ-Edit only contains non-rigid pair data. 2. There are many types of operations for instruction-based image editing task (e.g., object addition, object removal, non-rigid operation, local transformation, global transformation). The figure 8 and figure 9 only show the transformation part. The author should show the results of all these operations to make a comprehensive comparison. 3. Regarding metrics, since there is already a large amount of Human Evaluation Scores, why not us

Reviewer 03Rating 5Confidence 3

Strengths

1. High-Quality Dataset: The paper introduces HQ-Edit, a dataset with approximately 200,000 high-quality image edits, which is a significant contribution to the field of instruction-based image editing. 2. Advanced Foundation Models: Leveraging state-of-the-art models like GPT-4V and DALL-E 3 ensures that the dataset benefits from the latest advancements in AI, leading to high-resolution and detailed images. 3. Broad Coverage of Editing Operations: HQ-Edit covers a wide range of editing tasks, f

Weaknesses

1. Synthetic Data Limitations: Although the synthetic images are useful for training, the trained model may not perform well on real images. 2. Constrained Persuasiveness of Evaluation Metrics:The paper only conducted comparisons on the two evaluation metrics it proposed, Alignment and Coherence, without making comparisons on more widely used and popular metrics.

Code & Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis