TL;DR
TextBoost introduces an efficient method for personalizing text-to-image diffusion models by selectively fine-tuning only the text encoder, reducing computational costs while maintaining high-quality generation.
Contribution
It proposes a novel causality-preserving adaptation mechanism and lightweight adapters for effective, resource-efficient personalization of text-to-image models.
Findings
Faster convergence and reduced storage requirements compared to existing methods.
Maintains subject fidelity and improves text fidelity and diversity.
Achieves high-quality personalization with minimal computational overhead.
Abstract
In this paper, we introduce TextBoost, an efficient one-shot personalization approach for text-to-image diffusion models. Traditional personalization methods typically involve fine-tuning extensive portions of the model, leading to substantial storage requirements and slow convergence. In contrast, we propose selectively fine-tuning only the text encoder, significantly improving computational and storage efficiency. To preserve the original semantic integrity, we develop a novel causality-preserving adaptation mechanism. Additionally, lightweight adapters are employed to locally refine text embeddings immediately before their interaction with cross-attention layers, greatly enhancing the expressiveness of text embeddings with minimal computational overhead. Empirical evaluations across diverse concepts demonstrate that TextBoost achieves faster convergence and substantially reduces…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Handwritten Text Recognition Techniques · Generative Adversarial Networks and Image Synthesis
