Subject-driven Text-to-Image Generation via Preference-based Reinforcement Learning
Yanting Miao, William Loh, Suraj Kothawade, Pascal Poupart, Abdullah, Rashwan, Yeqing Li

TL;DR
This paper introduces a preference-based reinforcement learning method for subject-driven text-to-image generation that improves training efficiency and model regularization without requiring complex text encoder training.
Contribution
It proposes the $\lambda$-Harmonic reward function and Reward Preference Optimization (RPO), enabling faster, more efficient subject-driven image synthesis with fewer negative samples and no need for text encoder optimization.
Findings
Achieves state-of-the-art CLIP-I score of 0.833 on DreamBench.
Requires only 3% of negative samples compared to DreamBooth.
Provides reliable model selection via $\lambda$-Harmonic reward function.
Abstract
Text-to-image generative models have recently attracted considerable interest, enabling the synthesis of high-quality images from textual prompts. However, these models often lack the capability to generate specific subjects from given reference images or to synthesize novel renditions under varying conditions. Methods like DreamBooth and Subject-driven Text-to-Image (SuTI) have made significant progress in this area. Yet, both approaches primarily focus on enhancing similarity to reference images and require expensive setups, often overlooking the need for efficient training and avoiding overfitting to the reference images. In this work, we present the -Harmonic reward function, which provides a reliable reward signal and enables early stopping for faster training and effective regularization. By combining the Bradley-Terry preference model, the -Harmonic reward…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsVideo Analysis and Summarization
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Max Pooling · Concatenated Skip Connection · Convolution · U-Net · Focus · Early Stopping
