CoCoNO: Attention Contrast-and-Complete for Initial Noise Optimization in Text-to-Image Synthesis
Aravindan Sundaram, Ujjayan Pal, Abhimanyu Chauhan, Aishwarya Agarwal,, Srikrishna Karanam

TL;DR
CoCoNO introduces a novel noise optimization method for text-to-image synthesis that enhances semantic accuracy by addressing attention neglect and interference through specialized loss functions, improving alignment without retraining models.
Contribution
The paper proposes CoCoNO, a new algorithm that leverages attention contrast and completeness losses to improve initial latent optimization in text-to-image diffusion models, without retraining base models.
Findings
Significantly improves text-image alignment.
Outperforms current state-of-the-art methods.
Effective across multiple benchmarks.
Abstract
Despite recent advancements in text-to-image models, achieving semantically accurate images in text-to-image diffusion models is a persistent challenge. While existing initial latent optimization methods have demonstrated impressive performance, we identify two key limitations: (a) attention neglect, where the synthesized image omits certain subjects from the input prompt because they do not have a designated segment in the self-attention map despite despite having a high-response cross-attention, and (b) attention interference, where the generated image has mixed-up properties of multiple subjects because of a conflicting overlap between cross- and self-attention maps of different subjects. To address these limitations, we introduce CoCoNO, a new algorithm that optimizes the initial latent by leveraging the complementary information within self-attention and cross-attention maps. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Image Processing and 3D Reconstruction
MethodsSoftmax · Attention Is All You Need · Diffusion · Balanced Selection
