Divide & Bind Your Attention for Improved Generative Semantic Nursing

Yumeng Li; Margret Keuper; Dan Zhang; Anna Khoreva

arXiv:2307.10864·cs.CV·July 16, 2024·1 cites

Divide & Bind Your Attention for Improved Generative Semantic Nursing

Yumeng Li, Margret Keuper, Dan Zhang, Anna Khoreva

PDF

Open Access 1 Repo

TL;DR

This paper introduces Divide & Bind, a novel method for improving semantic fidelity in text-to-image generation by enhancing attribute binding and attention optimization during inference.

Contribution

It proposes two new loss functions for Generative Semantic Nursing, significantly improving attribute binding and semantic adherence in complex prompts.

Findings

01

Enhanced attribute binding in complex prompts

02

Superior performance on multiple benchmarks

03

Improved semantic fidelity in generated images

Abstract

Emerging large-scale text-to-image generative models, e.g., Stable Diffusion (SD), have exhibited overwhelming results with high fidelity. Despite the magnificent progress, current state-of-the-art models still struggle to generate images fully adhering to the input prompt. Prior work, Attend & Excite, has introduced the concept of Generative Semantic Nursing (GSN), aiming to optimize cross-attention during inference time to better incorporate the semantics. It demonstrates promising results in generating simple prompts, e.g., "a cat and a dog". However, its efficacy declines when dealing with more complex prompts, and it does not explicitly address the problem of improper attribute binding. To address the challenges posed by complex prompts or scenarios involving multiple entities and to achieve improved attribute binding, we propose Divide & Bind. We introduce two novel loss…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

boschresearch/Divide-and-Bind
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Healthcare · Computational and Text Analysis Methods · Multimodal Machine Learning Applications

MethodsLatent Diffusion Model · Diffusion