Layer-wise Instance Binding for Regional and Occlusion Control in Text-to-Image Diffusion Transformers
Ruidong Chen, Yancheng Bai, Xuanpu Zhang, Jianhao Zeng, Lanjun Wang, Dan Song, Lei Sun, Xiangxiang Chu, Anan Liu

TL;DR
LayerBind introduces a training-free, layer-wise approach to regional and occlusion control in text-to-image diffusion models, enabling precise editing and rearrangement of generated images with improved flexibility and quality.
Contribution
The paper proposes LayerBind, a novel layer-wise instance binding method that achieves controllable regional and occlusion editing in diffusion transformers without additional training.
Findings
Effective regional and occlusion control demonstrated
Supports flexible editing workflows
Outperforms existing methods in quality and controllability
Abstract
Region-instructed layout control in text-to-image generation is highly practical, yet existing methods suffer from limitations: (i) training-based approaches inherit data bias and often degrade image quality, and (ii) current techniques struggle with occlusion order, limiting real-world usability. To address these issues, we propose LayerBind. By modeling regional generation as distinct layers and binding them during the generation, our method enables precise regional and occlusion controllability. Our motivation stems from the observation that spatial layout and occlusion are established at a very early denoising stage, suggesting that rearranging the early latent structure is sufficient to modify the final output. Building on this, we structure the scheme into two phases: instance initialization and subsequent semantic nursing. (1) First, leveraging the contextual sharing mechanism in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Computer Graphics and Visualization Techniques · Multimodal Machine Learning Applications
