Multi-Concept T2I-Zero: Tweaking Only The Text Embeddings and Nothing   Else

Hazarapet Tunanyan; Dejia Xu; Shant Navasardyan; Zhangyang Wang,; Humphrey Shi

arXiv:2310.07419·cs.CV·October 12, 2023

Multi-Concept T2I-Zero: Tweaking Only The Text Embeddings and Nothing Else

Hazarapet Tunanyan, Dejia Xu, Shant Navasardyan, Zhangyang Wang,, Humphrey Shi

PDF

Open Access

TL;DR

This paper introduces a low-cost method to improve multi-concept text-to-image generation by tweaking text embeddings, overcoming limitations of existing models without additional training or inference costs.

Contribution

It proposes a novel, minimal adjustment technique for text embeddings that enhances multi-concept image synthesis in pre-trained diffusion models without retraining.

Findings

01

Outperforms previous methods in multi-concept generation

02

Improves image manipulation and personalization tasks

03

Requires no additional training or inference costs

Abstract

Recent advances in text-to-image diffusion models have enabled the photorealistic generation of images from text prompts. Despite the great progress, existing models still struggle to generate compositional multi-concept images naturally, limiting their ability to visualize human imagination. While several recent works have attempted to address this issue, they either introduce additional training or adopt guidance at inference time. In this work, we consider a more ambitious goal: natural multi-concept generation using a pre-trained diffusion model, and with almost no extra cost. To achieve this goal, we identify the limitations in the text embeddings used for the pre-trained text-to-image diffusion models. Specifically, we observe concept dominance and non-localized contribution that severely degrade multi-concept generation performance. We further design a minimal low-cost solution…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Image Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques

MethodsDiffusion