CoCoNO: Attention Contrast-and-Complete for Initial Noise Optimization   in Text-to-Image Synthesis

Aravindan Sundaram; Ujjayan Pal; Abhimanyu Chauhan; Aishwarya Agarwal,; Srikrishna Karanam

arXiv:2411.16783·cs.CV·November 27, 2024

CoCoNO: Attention Contrast-and-Complete for Initial Noise Optimization in Text-to-Image Synthesis

Aravindan Sundaram, Ujjayan Pal, Abhimanyu Chauhan, Aishwarya Agarwal,, Srikrishna Karanam

PDF

Open Access

TL;DR

CoCoNO introduces a novel noise optimization method for text-to-image synthesis that enhances semantic accuracy by addressing attention neglect and interference through specialized loss functions, improving alignment without retraining models.

Contribution

The paper proposes CoCoNO, a new algorithm that leverages attention contrast and completeness losses to improve initial latent optimization in text-to-image diffusion models, without retraining base models.

Findings

01

Significantly improves text-image alignment.

02

Outperforms current state-of-the-art methods.

03

Effective across multiple benchmarks.

Abstract

Despite recent advancements in text-to-image models, achieving semantically accurate images in text-to-image diffusion models is a persistent challenge. While existing initial latent optimization methods have demonstrated impressive performance, we identify two key limitations: (a) attention neglect, where the synthesized image omits certain subjects from the input prompt because they do not have a designated segment in the self-attention map despite despite having a high-response cross-attention, and (b) attention interference, where the generated image has mixed-up properties of multiple subjects because of a conflicting overlap between cross- and self-attention maps of different subjects. To address these limitations, we introduce CoCoNO, a new algorithm that optimizes the initial latent by leveraging the complementary information within self-attention and cross-attention maps. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Image Processing and 3D Reconstruction

MethodsSoftmax · Attention Is All You Need · Diffusion · Balanced Selection