Adaptive Divergence Regularized Policy Optimization for Fine-tuning Generative Models

Jiajun Fan; Tong Wei; Chaoran Cheng; Yuxin Chen; Ge Liu

arXiv:2510.18053·cs.LG·October 22, 2025

Adaptive Divergence Regularized Policy Optimization for Fine-tuning Generative Models

Jiajun Fan, Tong Wei, Chaoran Cheng, Yuxin Chen, Ge Liu

PDF

Open Access

TL;DR

This paper introduces ADRPO, an adaptive regularization method for reinforcement learning fine-tuning of generative models, which dynamically balances exploration and exploitation, leading to improved performance across various tasks and models.

Contribution

ADRPO adaptively adjusts divergence regularization during policy optimization, enabling better exploration and exploitation balance in fine-tuning generative models.

Findings

01

Outperforms fixed regularization methods in text-to-image generation.

02

Enables smaller models to surpass larger models in attribute binding and diversity.

03

Improves fine-tuning of LLMs and multi-modal models, surpassing larger models in several benchmarks.

Abstract

Balancing exploration and exploitation during reinforcement learning fine-tuning of generative models presents a critical challenge, as existing approaches rely on fixed divergence regularization that creates an inherent dilemma: strong regularization preserves model capabilities but limits reward optimization, while weak regularization enables greater alignment but risks instability or reward hacking. We introduce Adaptive Divergence Regularized Policy Optimization (ADRPO), which automatically adjusts regularization strength based on advantage estimates-reducing regularization for high-value samples while applying stronger regularization to poor samples, enabling policies to navigate between exploration and aggressive exploitation according to data quality. Our implementation with Wasserstein-2 regularization for flow matching generative models achieves remarkable results on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Music Technology and Sound Studies · Artificial Intelligence in Games