Lego: Learning to Disentangle and Invert Personalized Concepts Beyond   Object Appearance in Text-to-Image Diffusion Models

Saman Motamed; Danda Pani Paudel; Luc Van Gool

arXiv:2311.13833·cs.CV·September 30, 2024·1 cites

Lego: Learning to Disentangle and Invert Personalized Concepts Beyond Object Appearance in Text-to-Image Diffusion Models

Saman Motamed, Danda Pani Paudel, Luc Van Gool

PDF

Open Access

TL;DR

Lego is a novel method for inverting personalized concepts in text-to-image models, effectively disentangling complex concepts beyond appearance and style, leading to more accurate and aligned concept representations.

Contribution

Lego introduces a simple Subject Separation step and a Context Loss to improve inversion of entangled concepts in T2I models, surpassing existing methods.

Findings

01

Lego concepts are preferred over 70% in user studies.

02

Lego improves alignment of concepts with text descriptions.

03

Enhanced ability to invert complex, multi-word concepts.

Abstract

Text-to-Image (T2I) models excel at synthesizing concepts such as nouns, appearances, and styles. To enable customized content creation based on a few example images of a concept, methods such as Textual Inversion and DreamBooth invert the desired concept and enable synthesizing it in new scenes. However, inverting personalized concepts that go beyond object appearance and style (adjectives and verbs) through natural language remains a challenge. Two key characteristics of these concepts contribute to the limitations of current inversion methods. 1) Adjectives and verbs are entangled with nouns (subject) and can hinder appearance-based inversion methods, where the subject appearance leaks into the concept embedding, and 2) describing such concepts often extends beyond single word embeddings. In this study, we introduce Lego, a textual inversion method designed to invert…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques

MethodsDiffusion