An Image is Worth One Word: Personalizing Text-to-Image Generation using   Textual Inversion

Rinon Gal; Yuval Alaluf; Yuval Atzmon; Or Patashnik; Amit H. Bermano,; Gal Chechik; Daniel Cohen-Or

arXiv:2208.01618·cs.CV·August 3, 2022·465 cites

An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion

Rinon Gal, Yuval Alaluf, Yuval Atzmon, Or Patashnik, Amit H. Bermano,, Gal Chechik, Daniel Cohen-Or

PDF

Open Access 5 Repos 1 Video

TL;DR

This paper introduces Textual Inversion, a method that enables personalized text-to-image generation by learning new 'words' from just a few images, allowing users to create and modify unique concepts with natural language.

Contribution

It presents a simple, effective approach to personalize text-to-image models by learning new embeddings from limited images, enhancing creative control and fidelity.

Findings

01

Single embedding can capture diverse concepts

02

Outperforms baseline methods in concept fidelity

03

Enables intuitive composition of personalized concepts

Abstract

Text-to-image models offer unprecedented freedom to guide creation through natural language. Yet, it is unclear how such freedom can be exercised to generate images of specific unique concepts, modify their appearance, or compose them in new roles and novel scenes. In other words, we ask: how can we use language-guided models to turn our cat into a painting, or imagine a new product based on our favorite toy? Here we present a simple approach that allows such creative freedom. Using only 3-5 images of a user-provided concept, like an object or a style, we learn to represent it through new "words" in the embedding space of a frozen text-to-image model. These "words" can be composed into natural language sentences, guiding personalized creation in an intuitive way. Notably, we find evidence that a single word embedding is sufficient for capturing unique and varied concepts. We compare our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion· slideslive

Taxonomy

TopicsMultimodal Machine Learning Applications · Handwritten Text Recognition Techniques · Digital Humanities and Scholarship