EditHF-1M: A Million-Scale Rich Human Preference Feedback for Image Editing

Zitong Xu; Huiyu Duan; Zhongpeng Ji; Xinyun Zhang; Yutao Liu; Xiongkuo Min; Ke Gu; Jian Zhang; Shusong Xu; Jinwei Chen; Bo Li; Guangtao Zhai

arXiv:2603.14916·cs.CV·March 17, 2026

EditHF-1M: A Million-Scale Rich Human Preference Feedback for Image Editing

Zitong Xu, Huiyu Duan, Zhongpeng Ji, Xinyun Zhang, Yutao Liu, Xiongkuo Min, Ke Gu, Jian Zhang, Shusong Xu, Jinwei Chen, Bo Li, Guangtao Zhai

PDF

Open Access

TL;DR

This paper introduces EditHF-1M, a large-scale dataset with human preferences for image editing, and develops evaluation and reward models that improve alignment with human judgments and enhance image editing performance.

Contribution

The paper presents a novel million-scale dataset for human preferences in image editing and proposes a multimodal evaluation model and reward framework to improve image editing quality.

Findings

01

EditHF achieves superior human preference alignment.

02

Fine-tuning with EditHF-Reward improves image editing models.

03

The dataset and models enhance evaluation and optimization of image editing.

Abstract

Recent text-guided image editing (TIE) models have achieved remarkable progress, while many edited images still suffer from issues such as artifacts, unexpected editings, unaesthetic contents. Although some benchmarks and methods have been proposed for evaluating edited images, scalable evaluation models are still lacking, which limits the development of human feedback reward models for image editing. To address the challenges, we first introduce \textbf{EditHF-1M}, a million-scale image editing dataset with over 29M human preference pairs and 148K human mean opinion ratings, both evaluated from three dimensions, \textit{i.e.}, visual quality, instruction alignment, and attribute preservation. Based on EditHF-1M, we propose \textbf{EditHF}, a multimodal large language model (MLLM) based evaluation model, to provide human-aligned feedback from image editing. Finally, we introduce…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Digital Humanities and Scholarship