Local Prompt Adaptation for Style-Consistent Multi-Object Generation in Diffusion Models
Ankit Sanjyal

TL;DR
This paper introduces Local Prompt Adaptation (LPA), a training-free method that enhances style consistency and spatial coherence in multi-object diffusion-based image generation by selectively injecting content and style prompts into the model.
Contribution
LPA is a novel, training-free approach that improves style and layout control in diffusion models through prompt splitting and selective attention injection.
Findings
LPA improves CLIP-prompt alignment by +0.41% over vanilla SDXL.
LPA achieves +0.09% CLIP-prompt and +0.08% CLIP-style gains on style-rich benchmarks.
LPA is model-agnostic, easy to implement, and requires only a single configuration change.
Abstract
Diffusion models have become a powerful backbone for text-to-image generation, producing high-quality visuals from natural language prompts. However, when prompts involve multiple objects alongside global or local style instructions, the outputs often drift in style and lose spatial coherence, limiting their reliability for controlled, style-consistent scene generation. We present Local Prompt Adaptation (LPA), a lightweight, training-free method that splits the prompt into content and style tokens, then injects them selectively into the U-Net's attention layers at chosen timesteps. By conditioning object tokens early and style tokens later in the denoising process, LPA improves both layout control and stylistic uniformity without additional training cost. We conduct extensive ablations across parser settings and injection windows, finding that the best configuration -- lpa late only…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
