Training-Free Object-Background Compositional T2I via Dynamic Spatial Guidance and Multi-Path Pruning
Yang Deng, David Mould, Paul L. Rosin, Yu-Kun Lai

TL;DR
This paper introduces a training-free method for improving object-background compositionality in text-to-image diffusion models by using dynamic spatial guidance and multi-path pruning, enhancing scene coherence.
Contribution
It presents a novel training-free framework with dynamic spatial guidance and multi-path pruning to better model foreground-background interactions in diffusion models.
Findings
Improved background coherence in generated images.
Enhanced object-background compositional alignment.
Consistent performance gains across multiple diffusion backbones.
Abstract
Existing text-to-image diffusion models, while excelling at subject synthesis, exhibit a persistent foreground bias that treats the background as a passive and under-optimized byproduct. This imbalance compromises global scene coherence and constrains compositional control. To address the limitation, we propose a training-free framework that restructures diffusion sampling to explicitly account for foreground-background interactions. Our approach consists of two key components. First, Dynamic Spatial Guidance introduces a soft, time step dependent gating mechanism that modulates foreground and background attention during the diffusion process, enabling spatially balanced generation. Second, Multi-Path Pruning performs multi-path latent exploration and dynamically filters candidate trajectories using both internal attention statistics and external semantic alignment signals, retaining…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
