Delta-K: Boosting Multi-Instance Generation via Cross-Attention Augmentation
Zitong Wang, Zijun Shen, Haohao Xu, Zhengjie Luo, Weibin Wu

TL;DR
Delta-K is a novel inference framework that enhances multi-instance scene synthesis in diffusion models by injecting semantic signals into the cross-attention space, improving concept coherence without retraining.
Contribution
The paper introduces Delta-K, a plug-and-play method that operates in the shared cross-attention key space to address concept omission in diffusion models, without additional training or architectural changes.
Findings
Improves compositional alignment across various diffusion architectures.
Operates without spatial masks or retraining.
Enhances semantic coherence in complex scenes.
Abstract
While Diffusion Models excel in text-to-image synthesis, they often suffer from concept omission when synthesizing complex multi-instance scenes. Existing training-free methods attempt to resolve this by rescaling attention maps, which merely exacerbates unstructured noise without establishing coherent semantic representations. To address this, we propose Delta-K, a backbone-agnostic and plug-and-play inference framework that tackles omission by operating directly in the shared cross-attention Key space. Specifically, with Vision-language model, we extract a differential key that encodes the semantic signature of missing concepts. This signal is then injected during the early semantic planning stage of the diffusion process. Governed by a dynamically optimized scheduling mechanism, Delta-K grounds diffuse noise into stable structural anchors while preserving existing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Advanced Neural Network Applications
