GARDO: Reinforcing Diffusion Models without Reward Hacking
Haoran He, Yuxiao Ye, Jie Liu, Jiajun Liang, Zhiyong Wang, Ziyang Yuan, Xintao Wang, Hangyu Mao, Pengfei Wan, Ling Pan

TL;DR
GARDO is a flexible reinforcement learning framework for diffusion models that selectively regularizes uncertain samples, adaptively updates reference models, and boosts rewards for diverse high-quality outputs, effectively reducing reward hacking and improving diversity.
Contribution
GARDO introduces a novel selective regularization and adaptive reference update mechanism to improve diffusion model fine-tuning without reward hacking.
Findings
Mitigates reward hacking across various proxy rewards
Enhances generation diversity without sacrificing sample efficiency
Improves exploration by adaptive regularization
Abstract
Fine-tuning diffusion models via online reinforcement learning (RL) has shown great potential for enhancing text-to-image alignment. However, since precisely specifying a ground-truth objective for visual tasks remains challenging, the models are often optimized using a proxy reward that only partially captures the true goal. This mismatch often leads to reward hacking, where proxy scores increase while real image quality deteriorates and generation diversity collapses. While common solutions add regularization against the reference policy to prevent reward hacking, they compromise sample efficiency and impede the exploration of novel, high-reward regions, as the reference policy is usually sub-optimal. To address the competing demands of sample efficiency, effective exploration, and mitigation of reward hacking, we propose Gated and Adaptive Regularization with Diversity-aware…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCell Image Analysis Techniques · Generative Adversarial Networks and Image Synthesis · Computer Graphics and Visualization Techniques
