Leveraging Verifier-Based Reinforcement Learning in Image Editing
Hanzhong Guo, Jie Wu, Jie Liu, Yu Gao, Zilyu Ye, Linxiao Yuan, Xionghui Wang, Yizhou Yu, Weilin Huang

TL;DR
This paper introduces Edit-R1, a verifier-based reward model for image editing that improves upon existing methods by providing detailed, interpretable feedback, leading to better editing performance.
Contribution
The paper proposes a novel chain-of-thought verifier-based reward model and a reinforcement learning framework for image editing, addressing limitations of existing reward models.
Findings
Edit-RRM outperforms existing vision-language models as an editing reward.
Performance improves consistently with larger model sizes from 3B to 7B parameters.
Using Edit-R1 enhances image editing models like FLUX.1-kontext.
Abstract
While Reinforcement Learning from Human Feedback (RLHF) has become a pivotal paradigm for text-to-image generation, its application to image editing remains largely unexplored. A key bottleneck is the lack of a robust general reward model for all editing tasks. Existing edit reward models usually give overall scores without detailed checks, ignoring different instruction requirements and causing biased rewards. To address this, we argue that the key is to move from a simple scorer to a reasoning verifier. We introduce Edit-R1, a framework that builds a chain-of-thought (CoT) verifier-based reasoning reward model (RRM) and then leverages it for downstream image editing. The Edit-RRM breaks instructions into distinct principles, evaluates the edited image against each principle, and aggregates these checks into an interpretable, fine-grained reward. To build such an RRM, we first apply…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
