Text-Anchored Score Composition: Tackling Condition Misalignment in   Text-to-Image Diffusion Models

Luozhou Wang; Guibao Shen; Wenhang Ge; Guangyong Chen; Yijun Li,; Ying-cong Chen

arXiv:2306.14408·cs.CV·July 16, 2024·1 cites

Text-Anchored Score Composition: Tackling Condition Misalignment in Text-to-Image Diffusion Models

Luozhou Wang, Guibao Shen, Wenhang Ge, Guangyong Chen, Yijun Li,, Ying-cong Chen

PDF

Open Access 1 Repo

TL;DR

This paper introduces Text-Anchored Score Composition (TASC), a training-free method that improves controllability in text-to-image diffusion models when provided with partially aligned conditions, by decomposing and realigning condition pairs.

Contribution

TASC is a novel, training-free approach that separates and realigns condition pairs to handle misalignment issues in controllable image generation.

Findings

01

Effective in handling unaligned conditions

02

Outperforms recent methods in qualitative and quantitative tests

03

Adds flexibility to controllable image synthesis

Abstract

Text-to-image diffusion models have advanced towards more controllable generation via supporting various additional conditions (e.g.,depth map, bounding box) beyond text. However, these models are learned based on the premise of perfect alignment between the text and extra conditions. If this alignment is not satisfied, the final output could be either dominated by one condition, or ambiguity may arise, failing to meet user expectations. To address this issue, we present a training free approach called Text-Anchored Score Composition (TASC) to further improve the controllability of existing models when provided with partially aligned conditions. The TASC firstly separates conditions based on pair relationships, computing the result individually for each pair. This ensures that each pair no longer has conflicting conditions. Then we propose an attention realignment operation to realign…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

EnVision-Research/Decompose-and-Realign
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Music and Audio Processing · Domain Adaptation and Few-Shot Learning

MethodsDiffusion