Subject-driven Text-to-Image Generation via Preference-based   Reinforcement Learning

Yanting Miao; William Loh; Suraj Kothawade; Pascal Poupart; Abdullah; Rashwan; Yeqing Li

arXiv:2407.12164·cs.CV·December 24, 2024

Subject-driven Text-to-Image Generation via Preference-based Reinforcement Learning

Yanting Miao, William Loh, Suraj Kothawade, Pascal Poupart, Abdullah, Rashwan, Yeqing Li

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a preference-based reinforcement learning method for subject-driven text-to-image generation that improves training efficiency and model regularization without requiring complex text encoder training.

Contribution

It proposes the $\lambda$-Harmonic reward function and Reward Preference Optimization (RPO), enabling faster, more efficient subject-driven image synthesis with fewer negative samples and no need for text encoder optimization.

Findings

01

Achieves state-of-the-art CLIP-I score of 0.833 on DreamBench.

02

Requires only 3% of negative samples compared to DreamBooth.

03

Provides reliable model selection via $\lambda$-Harmonic reward function.

Abstract

Text-to-image generative models have recently attracted considerable interest, enabling the synthesis of high-quality images from textual prompts. However, these models often lack the capability to generate specific subjects from given reference images or to synthesize novel renditions under varying conditions. Methods like DreamBooth and Subject-driven Text-to-Image (SuTI) have made significant progress in this area. Yet, both approaches primarily focus on enhancing similarity to reference images and require expensive setups, often overlooking the need for efficient training and avoiding overfitting to the reference images. In this work, we present the $λ$ -Harmonic reward function, which provides a reliable reward signal and enables early stopping for faster training and effective regularization. By combining the Bradley-Terry preference model, the $λ$ -Harmonic reward…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

andrew-miao/RPO
pytorchOfficial

Videos

Subject-driven Text-to-Image Generation via Preference-based Reinforcement Learning· slideslive

Taxonomy

TopicsVideo Analysis and Summarization

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Max Pooling · Concatenated Skip Connection · Convolution · U-Net · Focus · Early Stopping