Subject-driven Text-to-Image Generation via Apprenticeship Learning

Wenhu Chen; Hexiang Hu; Yandong Li; Nataniel Ruiz; Xuhui Jia; Ming-Wei; Chang; William W. Cohen

arXiv:2304.00186·cs.CV·October 3, 2023·46 cites

Subject-driven Text-to-Image Generation via Apprenticeship Learning

Wenhu Chen, Hexiang Hu, Yandong Li, Nataniel Ruiz, Xuhui Jia, Ming-Wei, Chang, William W. Cohen

PDF

Open Access 1 Video

TL;DR

SuTI introduces a fast, subject-driven text-to-image generation method that uses apprenticeship learning to imitate expert models, enabling instant, high-quality, and customizable images without subject-specific fine-tuning.

Contribution

The paper presents SuTI, a novel in-context learning approach that replaces expensive fine-tuning with apprenticeship learning, significantly speeding up subject-specific image generation.

Findings

01

SuTI generates images 20x faster than state-of-the-art optimization methods.

02

Human evaluation shows SuTI outperforms existing models on subject and text alignment.

03

SuTI effectively imitates expert models trained on millions of subject-specific image clusters.

Abstract

Recent text-to-image generation models like DreamBooth have made remarkable progress in generating highly customized images of a target subject, by fine-tuning an ``expert model'' for a given subject from a few examples. However, this process is expensive, since a new expert model must be learned for each subject. In this paper, we present SuTI, a Subject-driven Text-to-Image generator that replaces subject-specific fine tuning with in-context learning. Given a few demonstrations of a new subject, SuTI can instantly generate novel renditions of the subject in different scenes, without any subject-specific optimization. SuTI is powered by apprenticeship learning, where a single apprentice model is learned from data generated by a massive number of subject-specific expert models. Specifically, we mine millions of image clusters from the Internet, each centered around a specific visual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Subject-driven Text-to-Image Generation via Apprenticeship Learning· slideslive

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Image Retrieval and Classification Techniques