Controllable Textual Inversion for Personalized Text-to-Image Generation

Jianan Yang; Haobo Wang; Yanming Zhang; Ruixuan Xiao; Sai Wu; Gang; Chen; Junbo Zhao

arXiv:2304.05265·cs.CV·September 26, 2023·1 cites

Controllable Textual Inversion for Personalized Text-to-Image Generation

Jianan Yang, Haobo Wang, Yanming Zhang, Ruixuan Xiao, Sai Wu, Gang, Chen, Junbo Zhao

PDF

Open Access 1 Repo

TL;DR

This paper introduces Controllable Textual Inversion (COTI), a robust, data-efficient method for personalized text-to-image generation that improves upon existing techniques by addressing key limitations with a novel loss and active learning.

Contribution

COTI enhances text inversion for personalized image generation by providing a theoretically-guided, active-learning-based framework that reduces data needs and improves robustness.

Findings

01

COTI achieves a 26.05 decrease in FID score.

02

COTI boosts R-precision by 23%.

03

Outperforms prior TI methods significantly.

Abstract

The recent large-scale generative modeling has attained unprecedented performance especially in producing high-fidelity images driven by text prompts. Text inversion (TI), alongside the text-to-image model backbones, is proposed as an effective technique in personalizing the generation when the prompts contain user-defined, unseen or long-tail concept tokens. Despite that, we find and show that the deployment of TI remains full of "dark-magics" -- to name a few, the harsh requirement of additional datasets, arduous human efforts in the loop and lack of robustness. In this work, we propose a much-enhanced version of TI, dubbed Controllable Textual Inversion (COTI), in resolving all the aforementioned problems and in turn delivering a robust, data-efficient and easy-to-use framework. The core to COTI is a theoretically-guided loss objective instantiated with a comprehensive and novel…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jnzju/coti
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Topic Modeling · Multimodal Machine Learning Applications