Offline Preference Optimization for Rectified Flow with Noise-Tracked Pairs
Yunhong Lu, Qichao Wang, Hengyuan Cao, Xiaoyin Xu, Min Zhang

TL;DR
This paper introduces PNAPO, a preference optimization method for rectified flow models that uses prior noise information and trajectory interpolation to improve alignment and efficiency.
Contribution
PNAPO is a novel off-policy preference optimization framework that leverages prior noise data and trajectory interpolation for rectified flow models.
Findings
PNAPO improves preference metrics on RF text-to-image models.
PNAPO reduces training compute significantly.
Trajectory interpolation constrains optimization, enhancing stability.
Abstract
Existing preference datasets for text-to-image models typically store only the final winner/loser images. This representation is insufficient for rectified flow (RF) models, whose generation is naturally indexed by a specific prior noise sample and follows a nearly straight denoising trajectory. In contrast, prior DPO-style alignment for diffusion models commonly estimates trajectories using an independent forward noising process, which can be mismatched to the true reverse dynamics and introduces unnecessary variance. We propose Prior Noise-Aware Preference Optimization (PNAPO), an off-policy alignment framework specialized for rectified flow. PNAPO augments preference data by retaining the paired prior noises used to generate each winner/loser image, turning the standard (prompt, winner, loser) triplet into a sextuple. Leveraging the straight-line property of RF, we estimate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
