Scalable Ranked Preference Optimization for Text-to-Image Generation

Shyamgopal Karthik; Huseyin Coskun; Zeynep Akata; Sergey Tulyakov,; Jian Ren; Anil Kag

arXiv:2410.18013·cs.CV·October 31, 2024

Scalable Ranked Preference Optimization for Text-to-Image Generation

Shyamgopal Karthik, Huseyin Coskun, Zeynep Akata, Sergey Tulyakov,, Jian Ren, Anil Kag

PDF

Open Access 3 Reviews

TL;DR

This paper introduces a scalable, fully synthetic dataset generation method for preference optimization in text-to-image models, improving alignment and quality without extensive human labeling.

Contribution

The authors propose RankDPO, a novel method utilizing synthetic preferences and ranking feedback to enhance text-to-image model alignment efficiently.

Findings

01

Synthetic preference datasets improve model prompt-following.

02

RankDPO enhances visual quality and alignment.

03

Scalable approach reduces reliance on human annotations.

Abstract

Direct Preference Optimization (DPO) has emerged as a powerful approach to align text-to-image (T2I) models with human feedback. Unfortunately, successful application of DPO to T2I models requires a huge amount of resources to collect and label large-scale datasets, e.g., millions of generated paired images annotated with human preferences. In addition, these human preference datasets can get outdated quickly as the rapid improvements of T2I models lead to higher quality images. In this work, we investigate a scalable approach for collecting large-scale and fully synthetic datasets for DPO training. Specifically, the preferences for paired images are generated using a pre-trained reward function, eliminating the need for involving humans in the annotation process, greatly improving the dataset collection efficiency. Moreover, we demonstrate that such datasets allow averaging predictions…

Peer Reviews

Decision·Submitted to ICLR 2025

Reviewer 01Rating 6Confidence 4

Strengths

- The overall presentation of this paper is clear. - I like the idea that extending existing human preference dataset with (win, lose) tuple to having more items, naturally leading to a rank based preference dataset. - The experiments are conducted on both big and medium sized models: SDXL and SD3-Medium, which is adequate. The comparison experiments with other models finetuned with DPO and different preference datasets show that the proposed new dataset and method has better performance than ot

Weaknesses

- Overall, I like the idea of using rank-based preference dataset and DPO, which should be able to provide more fine-grained preference guidance than (win, lose) style DPO and datasets. My only concern is that it is unclear whether the performance improvement of the proposed RankDPO is more because of the new synthetic dataset or the rank-based DPO. According to my understanding, in Table 3, DPO-SDXL, MaPO SDXL, SPO SDXL, and the proposed RankDPO SDXL are fine-tuned with different preference dat

Reviewer 02Rating 8Confidence 4

Strengths

1. The writing of this paper is clear and easy to follow. It's easy to understand the main contributions of this paper as I summarized above. 2. The contributions of this paper are solid. [Method contribution] The authors propose a new performance optimization method RankDPO, and the effectiveness is well supported by extensive experiments on SDXL, SD3-Medium models and GenEval, T2I-CompBench benchmarks. [Dataset contribution] To make the ranking-based preference optimization work, the authors c

Weaknesses

No obvious weaknesses. Please check the questions section.

Reviewer 03Rating 5Confidence 4

Strengths

+ Scalable and cost-effective dataset collection with synthetic preferences. + The proposed RankDPO framework effectively leverages ranked feedback to improve text-image alignment + Experiments on benchmark datasets could verify the effectiveness of the proposed method.

Weaknesses

- Lack of novelty: Existing work (Wallace et al. 2024, Liu et al. 2024c) have proposed diffusion-based DPO and List-wise ranking-based DPO. It seems that the proposed method combines them to tune diffusion models. - More baseline methods, especially those used to generate images in the process of the construction of Syn-Pic, should be considered and compared in Tab 1, Tab 2, and Tab 3. - The performance improvement gained by RankDPO over DPO (5 Rewards) in Tab 4 seems marginal. The effective

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques

MethodsALIGN · Direct Preference Optimization