From Competition to Synergy: Unlocking Reinforcement Learning for Subject-Driven Image Generation

Ziwei Huang; Ying Shu; Hao Fang; Quanyu Long; Wenya Wang; Qiushi Guo; Tiezheng Ge; Leilei Gan

arXiv:2510.18263·cs.LG·April 23, 2026

From Competition to Synergy: Unlocking Reinforcement Learning for Subject-Driven Image Generation

Ziwei Huang, Ying Shu, Hao Fang, Quanyu Long, Wenya Wang, Qiushi Guo, Tiezheng Ge, Leilei Gan

PDF

TL;DR

This paper introduces Customized-GRPO, a reinforcement learning framework that improves subject-driven image generation by balancing identity preservation and prompt adherence through novel reward shaping and dynamic weighting.

Contribution

The paper proposes a new reinforcement learning method with two innovations—Synergy-Aware Reward Shaping and Time-Aware Dynamic Weighting—that address limitations of naive approaches.

Findings

01

Outperforms naive GRPO baselines in experiments.

02

Effectively balances identity preservation and prompt adherence.

03

Mitigates competitive degradation in image generation.

Abstract

Subject-driven image generation models face a fundamental trade-off between identity preservation (fidelity) and prompt adherence (editability). While online reinforcement learning (RL), specifically GPRO, offers a promising solution, we find that a naive application of GRPO leads to competitive degradation, as the simple linear aggregation of rewards with static weights causes conflicting gradient signals and a misalignment with the temporal dynamics of the diffusion process. To overcome these limitations, we propose Customized-GRPO, a novel framework featuring two key innovations: (i) Synergy-Aware Reward Shaping (SARS), a non-linear mechanism that explicitly penalizes conflicted reward signals and amplifies synergistic ones, providing a sharper and more decisive gradient. (ii) Time-Aware Dynamic Weighting (TDW), which aligns the optimization pressure with the model's temporal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.