TL;DR
This paper introduces ViPO, a large-scale preference dataset, and Poly-DPO, an adaptive optimization method, to improve visual generative models by effectively handling noisy and diverse data.
Contribution
The paper presents ViPO, a massive preference dataset, and Poly-DPO, a robust optimization technique that adapts to data quality, enabling scalable preference learning.
Findings
Poly-DPO outperforms existing methods on noisy datasets.
Models trained on ViPO outperform those trained on smaller datasets.
Poly-DPO converges to standard DPO with high-quality data.
Abstract
While preference optimization is crucial for improving visual generative models, how to effectively scale this paradigm remains largely unexplored. Current open-source preference datasets contain conflicting preference patterns, where winners excel in some dimensions but underperform in others. Naively optimizing on such noisy datasets fails to learn preferences, hindering effective scaling. To enhance robustness against noise, we propose Poly-DPO, which extends the DPO objective with an additional polynomial term that dynamically adjusts model confidence based on dataset characteristics, enabling effective learning across diverse data distributions. Beyond biased patterns, existing datasets suffer from low resolution, limited prompt diversity, and imbalanced distributions. To facilitate large-scale visual preference optimization by tackling data bottlenecks, we construct ViPO, a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
