SceneBooth: Diffusion-based Framework for Subject-preserved   Text-to-Image Generation

Shang Chai; Zihang Lin; Min Zhou; Xubin Li; Liansheng Zhuang; Houqiang; Li

arXiv:2501.03490·cs.CV·January 8, 2025

SceneBooth: Diffusion-based Framework for Subject-preserved Text-to-Image Generation

Shang Chai, Zihang Lin, Min Zhou, Xubin Li, Liansheng Zhuang, Houqiang, Li

PDF

Open Access

TL;DR

SceneBooth is a diffusion-based framework that preserves the subject's appearance in text-to-image generation by fixing the subject image and generating backgrounds guided by scene layouts and text prompts.

Contribution

It introduces a novel approach that fixes the subject image and generates backgrounds, improving subject fidelity and scene harmony in text-to-image synthesis.

Findings

01

Outperforms baseline methods in subject preservation

02

Enhances image harmony and overall quality

03

Effectively integrates scene layouts with diffusion models

Abstract

Due to the demand for personalizing image generation, subject-driven text-to-image generation method, which creates novel renditions of an input subject based on text prompts, has received growing research interest. Existing methods often learn subject representation and incorporate it into the prompt embedding to guide image generation, but they struggle with preserving subject fidelity. To solve this issue, this paper approaches a novel framework named SceneBooth for subject-preserved text-to-image generation, which consumes inputs of a subject image, object phrases and text prompts. Instead of learning the subject representation and generating a subject, our SceneBooth fixes the given subject image and generates its background image guided by the text prompts. To this end, our SceneBooth introduces two key components, i.e., a multimodal layout generation module and a background…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Generative Adversarial Networks and Image Synthesis · Computer Graphics and Visualization Techniques

MethodsLatent Diffusion Model · Diffusion · ALIGN