Subject-driven Text-to-Image Generation via Apprenticeship Learning
Wenhu Chen, Hexiang Hu, Yandong Li, Nataniel Ruiz, Xuhui Jia, Ming-Wei, Chang, William W. Cohen

TL;DR
SuTI introduces a fast, subject-driven text-to-image generation method that uses apprenticeship learning to imitate expert models, enabling instant, high-quality, and customizable images without subject-specific fine-tuning.
Contribution
The paper presents SuTI, a novel in-context learning approach that replaces expensive fine-tuning with apprenticeship learning, significantly speeding up subject-specific image generation.
Findings
SuTI generates images 20x faster than state-of-the-art optimization methods.
Human evaluation shows SuTI outperforms existing models on subject and text alignment.
SuTI effectively imitates expert models trained on millions of subject-specific image clusters.
Abstract
Recent text-to-image generation models like DreamBooth have made remarkable progress in generating highly customized images of a target subject, by fine-tuning an ``expert model'' for a given subject from a few examples. However, this process is expensive, since a new expert model must be learned for each subject. In this paper, we present SuTI, a Subject-driven Text-to-Image generator that replaces subject-specific fine tuning with in-context learning. Given a few demonstrations of a new subject, SuTI can instantly generate novel renditions of the subject in different scenes, without any subject-specific optimization. SuTI is powered by apprenticeship learning, where a single apprentice model is learned from data generated by a massive number of subject-specific expert models. Specifically, we mine millions of image clusters from the Internet, each centered around a specific visual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Image Retrieval and Classification Techniques
