Divide & Bind Your Attention for Improved Generative Semantic Nursing
Yumeng Li, Margret Keuper, Dan Zhang, Anna Khoreva

TL;DR
This paper introduces Divide & Bind, a novel method for improving semantic fidelity in text-to-image generation by enhancing attribute binding and attention optimization during inference.
Contribution
It proposes two new loss functions for Generative Semantic Nursing, significantly improving attribute binding and semantic adherence in complex prompts.
Findings
Enhanced attribute binding in complex prompts
Superior performance on multiple benchmarks
Improved semantic fidelity in generated images
Abstract
Emerging large-scale text-to-image generative models, e.g., Stable Diffusion (SD), have exhibited overwhelming results with high fidelity. Despite the magnificent progress, current state-of-the-art models still struggle to generate images fully adhering to the input prompt. Prior work, Attend & Excite, has introduced the concept of Generative Semantic Nursing (GSN), aiming to optimize cross-attention during inference time to better incorporate the semantics. It demonstrates promising results in generating simple prompts, e.g., "a cat and a dog". However, its efficacy declines when dealing with more complex prompts, and it does not explicitly address the problem of improper attribute binding. To address the challenges posed by complex prompts or scenarios involving multiple entities and to achieve improved attribute binding, we propose Divide & Bind. We introduce two novel loss…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Computational and Text Analysis Methods · Multimodal Machine Learning Applications
MethodsLatent Diffusion Model · Diffusion
