MultiBooth: Towards Generating All Your Concepts in an Image from Text

Chenyang Zhu; Kai Li; Yue Ma; Chunming He; Xiu Li

arXiv:2404.14239·cs.CV·April 1, 2025·1 cites

MultiBooth: Towards Generating All Your Concepts in an Image from Text

Chenyang Zhu, Kai Li, Yue Ma, Chunming He, Xiu Li

PDF

Open Access 1 Repo 1 Video

TL;DR

MultiBooth is a new method for multi-concept image generation from text that improves fidelity and efficiency by dividing the process into learning individual concepts and integrating them with bounding boxes.

Contribution

It introduces a two-phase approach with a multi-modal encoder and bounding box guidance, enhancing multi-concept image generation in diffusion models.

Findings

01

Outperforms baselines in qualitative evaluations

02

Achieves higher concept fidelity

03

Reduces inference cost

Abstract

This paper introduces MultiBooth, a novel and efficient technique for multi-concept customization in image generation from text. Despite the significant advancements in customized generation methods, particularly with the success of diffusion models, existing methods often struggle with multi-concept scenarios due to low concept fidelity and high inference cost. MultiBooth addresses these issues by dividing the multi-concept generation process into two phases: a single-concept learning phase and a multi-concept integration phase. During the single-concept learning phase, we employ a multi-modal image encoder and an efficient concept encoding technique to learn a concise and discriminative representation for each concept. In the multi-concept integration phase, we use bounding boxes to define the generation area for each concept within the cross-attention map. This method enables the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

chenyangzhu1/multibooth
noneOfficial

Videos

MultiBooth: Towards Generating All Your Concepts in an Image from Text· underline

Taxonomy

TopicsImage Retrieval and Classification Techniques

MethodsDiffusion