When Policy Entropy Constraint Fails: Preserving Diversity in Flow-based RLHF via Perceptual Entropy

Xiaofeng Tan; Jun Liu; Bin-Bin Gao; Yuanting Fan; Xi Jiang; Chengjie Wang; Hongsong Wang; Feng Zheng

arXiv:2605.12112·cs.CV·May 13, 2026

When Policy Entropy Constraint Fails: Preserving Diversity in Flow-based RLHF via Perceptual Entropy

Xiaofeng Tan, Jun Liu, Bin-Bin Gao, Yuanting Fan, Xi Jiang, Chengjie Wang, Hongsong Wang, Feng Zheng

PDF

1 Repo

TL;DR

This paper identifies the failure of policy entropy regularization in flow-based RLHF for preserving diversity, introduces perceptual entropy as a solution, and demonstrates improved diversity and quality in experiments.

Contribution

The paper proposes perceptual entropy to better measure and preserve diversity in flow models, overcoming the limitations of policy entropy regularization.

Findings

01

Perceptual entropy maintains diversity where policy entropy fails.

02

Perceptual entropy-based strategies improve quality-diversity trade-off.

03

Experiments show significant gains in diversity and quality metrics.

Abstract

RLHF is widely used to align flow-matching text-to-image models with human preferences, but often leads to severe diversity collapse after fine-tuning. In RL, diversity is often assumed to correlate with policy entropy, motivating entropy regularization. However, we show this intuition breaks in flow models: policy entropy remains constant, even while perceptual diversity collapses. We explain this mismatch both theoretically and empirically: the constant entropy arises from the fixed, pre-defined noise schedule, while the diversity collapse is driven by the mode-seeking nature of policy gradients. As a result, policy entropy fails to prevent the model from converging to a narrow high-reward region in the perceptual space. To this end, we introduce perceptual entropy that captures diversity in a perceptual space and maintains the property of standard entropy. Building upon this insight,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://xiaofeng-tan.github.io/projects/PEC
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.