Repairing Catastrophic-Neglect in Text-to-Image Diffusion Models via   Attention-Guided Feature Enhancement

Zhiyuan Chang; Mingyang Li; Junjie Wang; Yi Liu; Qing Wang; Yang; Liu

arXiv:2406.16272·cs.CV·September 24, 2024

Repairing Catastrophic-Neglect in Text-to-Image Diffusion Models via Attention-Guided Feature Enhancement

Zhiyuan Chang, Mingyang Li, Junjie Wang, Yi Liu, Qing Wang, Yang, Liu

PDF

Open Access 1 Repo

TL;DR

This paper identifies and addresses the problem of catastrophic-neglect in text-to-image diffusion models, proposing an attention-guided feature enhancement method called Patcher that improves prompt adherence in generated images.

Contribution

The paper introduces Patcher, an automated approach that detects neglected objects and applies attention-guided feature enhancement to improve image prompt alignment in T2I diffusion models.

Findings

01

Patcher achieves 10.1%-16.3% higher Correct Rate in image generation.

02

Empirical study reveals the prevalence and mitigation strategies for catastrophic-neglect.

03

Experimental validation on Stable Diffusion versions shows effectiveness of the proposed method.

Abstract

Text-to-Image Diffusion Models (T2I DMs) have garnered significant attention for their ability to generate high-quality images from textual descriptions. However, these models often produce images that do not fully align with the input prompts, resulting in semantic inconsistencies. The most prominent issue among these semantic inconsistencies is catastrophic-neglect, where the images generated by T2I DMs miss key objects mentioned in the prompt. We first conduct an empirical study on this issue, exploring the prevalence of catastrophic-neglect, potential mitigation strategies with feature enhancement, and the insights gained. Guided by the empirical findings, we propose an automated repair approach named Patcher to address catastrophic-neglect in T2I DMs. Specifically, Patcher first determines whether there are any neglected objects in the prompt, and then applies attention-guided…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lsplx/patcher
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis

MethodsSoftmax · Attention Is All You Need · ALIGN · Diffusion