Power Reinforcement Post-Training of Text-to-Image Models with Super-Linear Advantage Shaping
Haoyuan Sun,Jing Wang,Yuxin Song,Yu Lu,Bo Fang,Yifu Luo,Jun Yin,Pengyu Zeng,Miao Zhang,Tiantian Zhang,Xueqian Wang,Shijian Lu

TL;DR
This paper introduces Super-Linear Advantage Shaping (SLAS), a novel reinforcement learning method for text-to-image models that enhances training efficiency, robustness, and performance by reshaping policy updates based on advantage-dependent geometry.
Contribution
SLAS extends the Fisher-Rao metric with advantage-dependent weighting, improving policy updates and mitigating reward hacking in post-training T2I models.
Findings
SLAS outperforms DanceGRPO across multiple benchmarks.
Faster training dynamics and better out-of-domain performance.
Enhanced robustness to model scaling and reduced reward hacking.
Abstract
Recently, post-training methods based on reinforcement learning, with a particular focus on Group Relative Policy Optimization (GRPO), have emerged as the robust paradigm for further advancement of text-to-image (T2I) models. However, these methods are often prone to reward hacking, wherein models exploit biases in imperfect reward functions rather than yielding genuine performance gains. In this work, we identify that normalization could lead to miscalibration and directly removing the prompt-level standard deviation term yields an optimal policy ascent direction that is linear in the advantage but still limits the separation of genuine signals from noise. To mitigate the above issues, we propose Super-Linear Advantage Shaping (SLAS) by revisiting the functional update from an information geometry perspective. By extending the Fisher-Rao information metric with advantage-dependent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
