Selectively Informative Description can Reduce Undesired Embedding   Entanglements in Text-to-Image Personalization

Jimyeong Kim; Jungwon Park; Wonjong Rhee

arXiv:2403.15330·cs.CV·March 25, 2024·1 cites

Selectively Informative Description can Reduce Undesired Embedding Entanglements in Text-to-Image Personalization

Jimyeong Kim, Jungwon Park, Wonjong Rhee

PDF

Open Access

TL;DR

This paper introduces SID, a novel text description strategy using multimodal GPT-4, to reduce undesired embedding entanglements in text-to-image personalization, improving image quality and alignment.

Contribution

The paper proposes SID, a new method for generating informative descriptions that mitigate biases and entanglements in text-to-image models, enhancing personalization fidelity.

Findings

01

SID reduces bias reflection in generated images.

02

Improves alignment between images and prompts.

03

Analyzes attention maps to understand bias mitigation.

Abstract

In text-to-image personalization, a timely and crucial challenge is the tendency of generated images overfitting to the biases present in the reference images. We initiate our study with a comprehensive categorization of the biases into background, nearby-object, tied-object, substance (in style re-contextualization), and pose biases. These biases manifest in the generated images due to their entanglement into the subject embedding. This undesired embedding entanglement not only results in the reflection of biases from the reference images into the generated images but also notably diminishes the alignment of the generated images with the given generation prompt. To address this challenge, we propose SID~(Selectively Informative Description), a text description strategy that deviates from the prevalent approach of only characterizing the subject's class identification. SID is generated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Visualization and Analytics · Advanced Text Analysis Techniques · Biomedical Text Mining and Ontologies

MethodsAttention Is All You Need · Linear Layer · Layer Normalization · Byte Pair Encoding · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Multi-Head Attention · Softmax · Dropout