TL;DR
This paper analyzes the role of entropy in combining Chain-of-Thought and Reinforcement Learning for text-to-image generation, revealing how entropy influences exploration, stability, and image quality, and proposes a new entropy-guided optimization method.
Contribution
The paper provides a systematic entropy-based analysis of CoT and RL interaction in T2I generation and introduces EG-GRPO, a novel entropy-guided fine-tuning strategy.
Findings
Lower CoT entropy leads to better image quality.
High entropy in tokens encourages exploration without collapse.
EG-GRPO achieves state-of-the-art results on benchmarks.
Abstract
Combining Chain-of-Thought (CoT) with Reinforcement Learning (RL) improves text-to-image (T2I) generation, yet the underlying interaction between CoT's exploration and RL's optimization remains unclear. We present a systematic entropy-based analysis that yields three key insights: (1) CoT expands the generative exploration space, while RL contracts it toward high-reward regions; (2) final reward is strongly negatively correlated with both the mean and variance of image-token entropy, highlighting the need to reduce uncertainty and instability; and (3) the entropy of the textual CoT directly governs downstream image quality, with lower-entropy CoTs leading to better generations. Motivated by these findings, we propose Entropy-Guided Group Relative Policy Optimization (EG-GRPO), a fine-tuning strategy that reallocates optimization budget by uncertainty: low-entropy tokens are excluded…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
