The Image as Its Own Reward: Reinforcement Learning with Adversarial Reward for Image Generation

Weijia Mao; Hao Chen; Zhenheng Yang; Mike Zheng Shou

arXiv:2511.20256·cs.CV·November 26, 2025

The Image as Its Own Reward: Reinforcement Learning with Adversarial Reward for Image Generation

Weijia Mao, Hao Chen, Zhenheng Yang, Mike Zheng Shou

PDF

Open Access

TL;DR

This paper introduces Adv-GRPO, an RL framework for image generation that uses an adversarial, reference-based reward system with foundation models, improving image quality, aesthetics, and robustness against reward hacking.

Contribution

The paper proposes Adv-GRPO, a novel RL approach that employs an adversarial reward guided by reference images and foundation models, reducing reward hacking and enhancing image quality.

Findings

01

Outperforms Flow-GRPO and SD3 in human evaluations.

02

Achieves 70.0% and 72.4% win rates in image quality and aesthetics.

03

Uses dense visual rewards for consistent improvements.

Abstract

A reliable reward function is essential for reinforcement learning (RL) in image generation. Most current RL approaches depend on pre-trained preference models that output scalar rewards to approximate human preferences. However, these rewards often fail to capture human perception and are vulnerable to reward hacking, where higher scores do not correspond to better images. To address this, we introduce Adv-GRPO, an RL framework with an adversarial reward that iteratively updates both the reward model and the generator. The reward model is supervised using reference images as positive samples and can largely avoid being hacked. Unlike KL regularization that constrains parameter updates, our learned reward directly guides the generator through its visual outputs, leading to higher-quality images. Moreover, while optimizing existing reward functions can alleviate reward hacking, their…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Visual Attention and Saliency Detection · Aesthetic Perception and Analysis