Adaptive Divergence Regularized Policy Optimization for Fine-tuning Generative Models
Jiajun Fan, Tong Wei, Chaoran Cheng, Yuxin Chen, Ge Liu

TL;DR
This paper introduces ADRPO, an adaptive regularization method for reinforcement learning fine-tuning of generative models, which dynamically balances exploration and exploitation, leading to improved performance across various tasks and models.
Contribution
ADRPO adaptively adjusts divergence regularization during policy optimization, enabling better exploration and exploitation balance in fine-tuning generative models.
Findings
Outperforms fixed regularization methods in text-to-image generation.
Enables smaller models to surpass larger models in attribute binding and diversity.
Improves fine-tuning of LLMs and multi-modal models, surpassing larger models in several benchmarks.
Abstract
Balancing exploration and exploitation during reinforcement learning fine-tuning of generative models presents a critical challenge, as existing approaches rely on fixed divergence regularization that creates an inherent dilemma: strong regularization preserves model capabilities but limits reward optimization, while weak regularization enables greater alignment but risks instability or reward hacking. We introduce Adaptive Divergence Regularized Policy Optimization (ADRPO), which automatically adjusts regularization strength based on advantage estimates-reducing regularization for high-value samples while applying stronger regularization to poor samples, enabling policies to navigate between exploration and aggressive exploitation according to data quality. Our implementation with Wasserstein-2 regularization for flow matching generative models achieves remarkable results on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Music Technology and Sound Studies · Artificial Intelligence in Games
