Enhancing Semantic Fidelity in Text-to-Image Synthesis: Attention   Regulation in Diffusion Models

Yang Zhang; Teoh Tze Tzun; Lim Wei Hern; Tiviatis Sim; Kenji Kawaguchi

arXiv:2403.06381·cs.CV·March 12, 2024·2 cites

Enhancing Semantic Fidelity in Text-to-Image Synthesis: Attention Regulation in Diffusion Models

Yang Zhang, Teoh Tze Tzun, Lim Wei Hern, Tiviatis Sim, Kenji Kawaguchi

PDF

Open Access 1 Repo

TL;DR

This paper introduces an on-the-fly attention regulation method for diffusion models that improves semantic fidelity in text-to-image synthesis without additional training, maintaining original model performance.

Contribution

It proposes a novel, computation-efficient attention regulation technique that aligns attention maps with input prompts during inference, enhancing semantic accuracy in generated images.

Findings

01

Outperforms baseline methods across datasets and metrics

02

Requires no additional training or fine-tuning

03

Reduces computational overhead during inference

Abstract

Recent advancements in diffusion models have notably improved the perceptual quality of generated images in text-to-image synthesis tasks. However, diffusion models often struggle to produce images that accurately reflect the intended semantics of the associated text prompts. We examine cross-attention layers in diffusion models and observe a propensity for these layers to disproportionately focus on certain tokens during the generation process, thereby undermining semantic fidelity. To address the issue of dominant attention, we introduce attention regulation, a computation-efficient on-the-fly optimization approach at inference time to align attention maps with the input text prompt. Notably, our method requires no additional training or fine-tuning and serves as a plug-in module on a model. Hence, the generation capacity of the original model is fully preserved. We compare our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yangzhang-v5/attention_regulation
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques

MethodsDiffusion · Focus · ALIGN