InstructBooth: Instruction-following Personalized Text-to-Image Generation
Daewon Chae, Nokyung Park, Jinkyu Kim, Kimin Lee

TL;DR
InstructBooth is a novel method that improves the alignment between images and text in personalized text-to-image generation by combining personalization with reinforcement learning, leading to better results without losing personalization.
Contribution
It introduces a reinforcement learning-based fine-tuning process to enhance image-text alignment in personalized models, maintaining high personalization with improved accuracy.
Findings
Outperforms existing baselines in image-text alignment.
Maintains high personalization ability.
Human evaluations show superior performance.
Abstract
Personalizing text-to-image models using a limited set of images for a specific object has been explored in subject-specific image generation. However, existing methods often face challenges in aligning with text prompts due to overfitting to the limited training images. In this work, we introduce InstructBooth, a novel method designed to enhance image-text alignment in personalized text-to-image models without sacrificing the personalization ability. Our approach first personalizes text-to-image models with a small number of subject-specific images using a unique identifier. After personalization, we fine-tune personalized text-to-image models using reinforcement learning to maximize a reward that quantifies image-text alignment. Additionally, we propose complementary techniques to increase the synergy between these two processes. Our method demonstrates superior image-text alignment…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques
MethodsSparse Evolutionary Training
