User-Friendly Customized Generation with Multi-Modal Prompts

Linhao Zhong; Yan Hong; Wentao Chen; Binglin Zhou; Yiyi Zhang; Jianfu; Zhang; Liqing Zhang

arXiv:2405.16501·cs.CV·May 28, 2024

User-Friendly Customized Generation with Multi-Modal Prompts

Linhao Zhong, Yan Hong, Wentao Chen, Binglin Zhou, Yiyi Zhang, Jianfu, Zhang, Liqing Zhang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a user-friendly multi-modal prompt approach for text-to-image generation that simplifies customization by requiring only one image and text per concept, improving ease of use and customization complexity.

Contribution

It proposes a novel multi-modal prompt method that reduces user effort and enhances customization capabilities in text-to-image models compared to existing finetuning techniques.

Findings

01

Outperforms finetune-based methods in user-friendliness

02

Enables complex object customization with minimal inputs

03

Facilitates precise scene customization

Abstract

Text-to-image generation models have seen considerable advancement, catering to the increasing interest in personalized image creation. Current customization techniques often necessitate users to provide multiple images (typically 3-5) for each customized object, along with the classification of these objects and descriptive textual prompts for scenes. This paper questions whether the process can be made more user-friendly and the customization more intricate. We propose a method where users need only provide images along with text for each customization topic, and necessitates only a single image per visual concept. We introduce the concept of a ``multi-modal prompt'', a novel integration of text and images tailored to each customization concept, which simplifies user interaction and facilitates precise customization of both objects and scenes. Our proposed paradigm for customized…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhongzero/multi-modal-prompt
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Software Engineering Methodologies