AlignIT: Enhancing Prompt Alignment in Customization of Text-to-Image Models
Aishwarya Agarwal, Srikrishna Karanam, Balaji Vasan Srinivasan

TL;DR
AlignIT is a novel post-processing method that improves prompt-image alignment in customized text-to-image models by adjusting key and value representations during text encoding, enhancing prompt fidelity without sacrificing customization.
Contribution
This work introduces AlignIT, a post-processing algorithm that enhances prompt alignment in text-to-image models by manipulating key and value vectors during encoding, compatible with existing customization techniques.
Findings
Significantly improves prompt-image alignment in customized models
Retains customization quality while enhancing prompt fidelity
Can be integrated with existing customization methods
Abstract
We consider the problem of customizing text-to-image diffusion models with user-supplied reference images. Given new prompts, the existing methods can capture the key concept from the reference images but fail to align the generated image with the prompt. In this work, we seek to address this key issue by proposing new methods that can easily be used in conjunction with existing customization methods that optimize the embeddings/weights at various intermediate stages of the text encoding process. The first contribution of this paper is a dissection of the various stages of the text encoding process leading up to the conditioning vector for text-to-image models. We take a holistic view of existing customization methods and notice that key and value outputs from this process differs substantially from their corresponding baseline (non-customized) models (e.g., baseline stable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Visualization and Analytics · Semantic Web and Ontologies
MethodsALIGN · Diffusion
