GDRO: Group-level Reward Post-training Suitable for Diffusion Models
Yiyang Wang, Xi Chen, Xiaogang Xu, Yu Liu, Hengshuang Zhao

TL;DR
GDRO introduces a novel post-training method for diffusion models that aligns rewards efficiently and robustly, overcoming stochasticity and sampling issues inherent in traditional online RL approaches.
Contribution
The paper proposes GDRO, a new offline, group-level reward optimization paradigm tailored for diffusion models, reducing training time and eliminating the need for stochastic sampling.
Findings
GDRO improves reward scores effectively across OCR and GenEval tasks.
It supports full offline training, saving significant time.
GDRO demonstrates robustness against reward hacking.
Abstract
Recent advancements adopt online reinforcement learning (RL) from LLMs to text-to-image rectified flow diffusion models for reward alignment. The use of group-level rewards successfully aligns the model with the targeted reward. However, it faces challenges including low efficiency, dependency on stochastic samplers, and reward hacking. The problem is that rectified flow models are fundamentally different from LLMs: 1) For efficiency, online image sampling takes much more time and dominates the time of training. 2) For stochasticity, rectified flow is deterministic once the initial noise is fixed. Aiming at these problems and inspired by the effects of group-level rewards from LLMs, we design Group-level Direct Reward Optimization (GDRO). GDRO is a new post-training paradigm for group-level reward alignment that combines the characteristics of rectified flow models. Through rigorous…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Reinforcement Learning in Robotics
