LocRef-Diffusion:Tuning-Free Layout and Appearance-Guided Generation
Fan Deng, Yaguang Wu, Xinyang Yu, Xiangjun Huang, Jian Yang, Guangyu, Yan, Qiang Xu

TL;DR
LocRef-Diffusion is a tuning-free diffusion model enabling personalized, controllable image generation with precise instance placement and appearance matching, advancing the capabilities of text-to-image synthesis.
Contribution
The paper introduces LocRef-Diffusion, a novel tuning-free model with layout and appearance control, utilizing Layout-net and appearance-net modules for enhanced personalization.
Findings
Achieves state-of-the-art results on COCO and OpenImages datasets.
Effectively controls instance layout and appearance in generated images.
Demonstrates superior performance over existing methods.
Abstract
Recently, text-to-image models based on diffusion have achieved remarkable success in generating high-quality images. However, the challenge of personalized, controllable generation of instances within these images remains an area in need of further development. In this paper, we present LocRef-Diffusion, a novel, tuning-free model capable of personalized customization of multiple instances' appearance and position within an image. To enhance the precision of instance placement, we introduce a Layout-net, which controls instance generation locations by leveraging both explicit instance layout information and an instance region cross-attention module. To improve the appearance fidelity to reference images, we employ an appearance-net that extracts instance appearance features and integrates them into the diffusion model through cross-attention mechanisms. We conducted extensive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis
MethodsDiffusion
