Zero-Shot Personalization of Objects via Textual Inversion
Aniket Roy, Maitreya Suin, Rama Chellappa

TL;DR
This paper introduces a novel zero-shot method for rapid, flexible, and training-free personalization of diverse objects in text-to-image diffusion models, significantly broadening customization capabilities.
Contribution
It proposes a learned network to predict object-specific textual embeddings, enabling fast, zero-shot object personalization within diffusion models without additional training.
Findings
Effective across multiple object categories
Supports rapid, one-pass personalization
Outperforms existing identity-specific methods
Abstract
Recent advances in text-to-image diffusion models have substantially improved the quality of image customization, enabling the synthesis of highly realistic images. Despite this progress, achieving fast and efficient personalization remains a key challenge, particularly for real-world applications. Existing approaches primarily accelerate customization for human subjects by injecting identity-specific embeddings into diffusion models, but these strategies do not generalize well to arbitrary object categories, limiting their applicability. To address this limitation, we propose a novel framework that employs a learned network to predict object-specific textual inversion embeddings, which are subsequently integrated into the UNet timesteps of a diffusion model for text-conditional customization. This design enables rapid, zero-shot personalization of a wide range of objects in a single…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis · Domain Adaptation and Few-Shot Learning
