ViCo: Plug-and-play Visual Condition for Personalized Text-to-image   Generation

Shaozhe Hao; Kai Han; Shihao Zhao; Kwan-Yee K. Wong

arXiv:2306.00971·cs.CV·December 8, 2023·2 cites

ViCo: Plug-and-play Visual Condition for Personalized Text-to-image Generation

Shaozhe Hao, Kai Han, Shihao Zhao, Kwan-Yee K. Wong

PDF

Open Access 1 Repo

TL;DR

ViCo introduces a lightweight, plug-and-play visual conditioning method for personalized text-to-image generation that does not require fine-tuning of the diffusion model, achieving state-of-the-art results efficiently.

Contribution

ViCo presents a novel visual conditioning approach that integrates into diffusion models without fine-tuning, enabling scalable and flexible personalized image generation.

Findings

01

Achieves comparable or superior performance to state-of-the-art models.

02

Requires only about 6% of the parameters for training.

03

Operates without fine-tuning the original diffusion model.

Abstract

Personalized text-to-image generation using diffusion models has recently emerged and garnered significant interest. This task learns a novel concept (e.g., a unique toy), illustrated in a handful of images, into a generative model that captures fine visual details and generates photorealistic images based on textual embeddings. In this paper, we present ViCo, a novel lightweight plug-and-play method that seamlessly integrates visual condition into personalized text-to-image generation. ViCo stands out for its unique feature of not requiring any fine-tuning of the original diffusion model parameters, thereby facilitating more flexible and scalable model deployment. This key advantage distinguishes ViCo from most existing models that necessitate partial or full diffusion fine-tuning. ViCo incorporates an image attention module that conditions the diffusion process on patch-wise visual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

haoosz/vico
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Generative Adversarial Networks and Image Synthesis

MethodsDiffusion