Pareto-Guided Optimal Transport for Multi-Reward Alignment

Ying Ba; Tianyu Zhang; Mohan Zhou; Yalong Bai; Wenyi Mo; Guiwei Zhang; Bing Su; Ji-Rong Wen

arXiv:2605.13155·cs.CV·May 14, 2026

Pareto-Guided Optimal Transport for Multi-Reward Alignment

Ying Ba, Tianyu Zhang, Mohan Zhou, Yalong Bai, Wenyi Mo, Guiwei Zhang, Bing Su, Ji-Rong Wen

PDF

TL;DR

This paper introduces PG-OT, a Pareto-guided optimal transport framework for multi-reward alignment in text-to-image models, addressing reward hacking and balancing conflicting objectives.

Contribution

It proposes a novel Pareto frontier-guided optimal transport method with new metrics, improving multi-reward alignment and robustness in preference optimization.

Findings

01

Outperforms baselines with 11% higher JDR.

02

Achieves near 80% win rate in human evaluations.

03

Effectively mitigates reward hacking and balances conflicting rewards.

Abstract

Text-to-image generation models have achieved remarkable progress in preference optimization, yet achieving robust alignment across diverse reward models remains a significant challenge. Existing multi-reward fusion approaches rely on weighted summation, which is costly to tune and insufficient for balancing conflicting objectives. More critically, optimization with reward models is highly susceptible to reward hacking, where reward scores increase while the perceived quality of generated images deteriorates. We demonstrate that optimizing against a unified global target under heterogeneous reward upper bounds can induce reward hacking, a risk further exacerbated by the inherent instability of weak reward models. To mitigate this, we propose a Pareto Frontier-Guided Optimal Transport (PG-OT) framework. Our method constructs a prompt-specific Pareto frontier and maps dominated samples…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.