LocRef-Diffusion:Tuning-Free Layout and Appearance-Guided Generation

Fan Deng; Yaguang Wu; Xinyang Yu; Xiangjun Huang; Jian Yang; Guangyu; Yan; Qiang Xu

arXiv:2411.15252·cs.CV·November 26, 2024

LocRef-Diffusion:Tuning-Free Layout and Appearance-Guided Generation

Fan Deng, Yaguang Wu, Xinyang Yu, Xiangjun Huang, Jian Yang, Guangyu, Yan, Qiang Xu

PDF

Open Access

TL;DR

LocRef-Diffusion is a tuning-free diffusion model enabling personalized, controllable image generation with precise instance placement and appearance matching, advancing the capabilities of text-to-image synthesis.

Contribution

The paper introduces LocRef-Diffusion, a novel tuning-free model with layout and appearance control, utilizing Layout-net and appearance-net modules for enhanced personalization.

Findings

01

Achieves state-of-the-art results on COCO and OpenImages datasets.

02

Effectively controls instance layout and appearance in generated images.

03

Demonstrates superior performance over existing methods.

Abstract

Recently, text-to-image models based on diffusion have achieved remarkable success in generating high-quality images. However, the challenge of personalized, controllable generation of instances within these images remains an area in need of further development. In this paper, we present LocRef-Diffusion, a novel, tuning-free model capable of personalized customization of multiple instances' appearance and position within an image. To enhance the precision of instance placement, we introduce a Layout-net, which controls instance generation locations by leveraging both explicit instance layout information and an instance region cross-attention module. To improve the appearance fidelity to reference images, we employ an appearance-net that extracts instance appearance features and integrates them into the diffusion model through cross-attention mechanisms. We conducted extensive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis

MethodsDiffusion