Text-Anchored Score Composition: Tackling Condition Misalignment in Text-to-Image Diffusion Models
Luozhou Wang, Guibao Shen, Wenhang Ge, Guangyong Chen, Yijun Li,, Ying-cong Chen

TL;DR
This paper introduces Text-Anchored Score Composition (TASC), a training-free method that improves controllability in text-to-image diffusion models when provided with partially aligned conditions, by decomposing and realigning condition pairs.
Contribution
TASC is a novel, training-free approach that separates and realigns condition pairs to handle misalignment issues in controllable image generation.
Findings
Effective in handling unaligned conditions
Outperforms recent methods in qualitative and quantitative tests
Adds flexibility to controllable image synthesis
Abstract
Text-to-image diffusion models have advanced towards more controllable generation via supporting various additional conditions (e.g.,depth map, bounding box) beyond text. However, these models are learned based on the premise of perfect alignment between the text and extra conditions. If this alignment is not satisfied, the final output could be either dominated by one condition, or ambiguity may arise, failing to meet user expectations. To address this issue, we present a training free approach called Text-Anchored Score Composition (TASC) to further improve the controllability of existing models when provided with partially aligned conditions. The TASC firstly separates conditions based on pair relationships, computing the result individually for each pair. This ensures that each pair no longer has conflicting conditions. Then we propose an attention realignment operation to realign…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Music and Audio Processing · Domain Adaptation and Few-Shot Learning
MethodsDiffusion
