Pareto-Guided Optimal Transport for Multi-Reward Alignment
Ying Ba, Tianyu Zhang, Mohan Zhou, Yalong Bai, Wenyi Mo, Guiwei Zhang, Bing Su, Ji-Rong Wen

TL;DR
This paper introduces PG-OT, a Pareto-guided optimal transport framework for multi-reward alignment in text-to-image models, addressing reward hacking and balancing conflicting objectives.
Contribution
It proposes a novel Pareto frontier-guided optimal transport method with new metrics, improving multi-reward alignment and robustness in preference optimization.
Findings
Outperforms baselines with 11% higher JDR.
Achieves near 80% win rate in human evaluations.
Effectively mitigates reward hacking and balances conflicting rewards.
Abstract
Text-to-image generation models have achieved remarkable progress in preference optimization, yet achieving robust alignment across diverse reward models remains a significant challenge. Existing multi-reward fusion approaches rely on weighted summation, which is costly to tune and insufficient for balancing conflicting objectives. More critically, optimization with reward models is highly susceptible to reward hacking, where reward scores increase while the perceived quality of generated images deteriorates. We demonstrate that optimizing against a unified global target under heterogeneous reward upper bounds can induce reward hacking, a risk further exacerbated by the inherent instability of weak reward models. To mitigate this, we propose a Pareto Frontier-Guided Optimal Transport (PG-OT) framework. Our method constructs a prompt-specific Pareto frontier and maps dominated samples…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
