Curriculum Direct Preference Optimization for Diffusion and Consistency Models

Florinel-Alin Croitoru; Vlad Hondru; Radu Tudor Ionescu; Nicu Sebe; Mubarak Shah

arXiv:2405.13637·cs.CV·May 12, 2025

Curriculum Direct Preference Optimization for Diffusion and Consistency Models

Florinel-Alin Croitoru, Vlad Hondru, Radu Tudor Ionescu, Nicu Sebe, Mubarak Shah

PDF

Open Access 1 Repo 1 Models

TL;DR

This paper introduces Curriculum DPO, a two-stage training method for text-to-image models that uses curriculum learning to improve alignment and aesthetics, outperforming existing fine-tuning approaches.

Contribution

It presents a novel curriculum learning framework for DPO, incorporating difficulty-based sampling of training pairs to enhance text-to-image generation quality.

Findings

01

Outperforms state-of-the-art fine-tuning methods on nine benchmarks.

02

Improves text alignment, aesthetics, and human preference.

03

Uses rank difference as a measure of training pair difficulty.

Abstract

Direct Preference Optimization (DPO) has been proposed as an effective and efficient alternative to reinforcement learning from human feedback (RLHF). In this paper, we propose a novel and enhanced version of DPO based on curriculum learning for text-to-image generation. Our method is divided into two training stages. First, a ranking of the examples generated for each prompt is obtained by employing a reward model. Then, increasingly difficult pairs of examples are sampled and provided to a text-to-image generative (diffusion or consistency) model. Generated samples that are far apart in the ranking are considered to form easy pairs, while those that are close in the ranking form hard pairs. In other words, we use the rank difference between samples as a measure of difficulty. The sampled pairs are split into batches according to their difficulty levels, which are gradually used to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

croitorualin/curriculum-dpo
pytorchOfficial

Models

🤗
acroitoru/curriculum-dpo-loras
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLogic, Reasoning, and Knowledge · Service-Oriented Architecture and Web Services · Cloud Computing and Resource Management

MethodsDirect Preference Optimization