Beyond the Dirac Delta: Mitigating Diversity Collapse in Reinforcement Fine-Tuning for Versatile Image Generation

Jinmei Liu; Haoru Li; Zhenhong Sun; Chaofeng Chen; Yatao Bian; Bo Wang; Daoyi Dong; Chunlin Chen; Zhi Wang

arXiv:2601.12401·cs.LG·January 21, 2026

Beyond the Dirac Delta: Mitigating Diversity Collapse in Reinforcement Fine-Tuning for Versatile Image Generation

Jinmei Liu, Haoru Li, Zhenhong Sun, Chaofeng Chen, Yatao Bian, Bo Wang, Daoyi Dong, Chunlin Chen, Zhi Wang

PDF

Open Access

TL;DR

This paper introduces DRIFT, a reinforcement learning framework that mitigates diversity collapse in fine-tuning large generative models, balancing task alignment with output diversity for versatile image generation.

Contribution

DRIFT systematically incentivizes output diversity during RL fine-tuning, addressing the diversity collapse problem and improving versatility in image generation tasks.

Findings

01

Achieves 9.08% to 43.46% increase in diversity at the same alignment level.

02

Yields 59.65% to 65.86% improvement in alignment at the same diversity level.

03

Outperforms existing methods in balancing diversity and task alignment.

Abstract

Reinforcement learning (RL) has emerged as a powerful paradigm for fine-tuning large-scale generative models, such as diffusion and flow models, to align with complex human preferences and user-specified tasks. A fundamental limitation remains \textit{the curse of diversity collapse}, where the objective formulation and optimization landscape inherently collapse the policy to a Dirac delta distribution. To address this challenge, we propose \textbf{DRIFT} (\textbf{D}ive\textbf{R}sity-\textbf{I}ncentivized Reinforcement \textbf{F}ine-\textbf{T}uning for Versatile Image Generation), an innovative framework that systematically incentivizes output diversity throughout the on-policy fine-tuning process, reconciling strong task alignment with high generation diversity to enhance versatility essential for applications that demand diverse candidate generations. We approach the problem across…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Reinforcement Learning in Robotics · Music Technology and Sound Studies