A-STAR: Test-time Attention Segregation and Retention for Text-to-image Synthesis
Aishwarya Agarwal, Srikrishna Karanam, K J Joseph, Apoorv, Saxena, Koustava Goswami, Balaji Vasan Srinivasan

TL;DR
This paper introduces A-STAR, a method that improves text-to-image synthesis by applying test-time attention segregation and retention losses, which enhance concept distinction and retention throughout the generation process.
Contribution
The paper proposes two novel test-time attention-based loss functions that significantly enhance the ability of pretrained diffusion models to distinguish and retain multiple concepts during image generation.
Findings
Reduced cross-attention overlap among concepts
Improved retention of concepts across denoising steps
Enhanced quality and accuracy of generated images
Abstract
While recent developments in text-to-image generative models have led to a suite of high-performing methods capable of producing creative imagery from free-form text, there are several limitations. By analyzing the cross-attention representations of these models, we notice two key issues. First, for text prompts that contain multiple concepts, there is a significant amount of pixel-space overlap (i.e., same spatial regions) among pairs of different concepts. This eventually leads to the model being unable to distinguish between the two concepts and one of them being ignored in the final generation. Next, while these models attempt to capture all such concepts during the beginning of denoising (e.g., first few steps) as evidenced by cross-attention maps, this knowledge is not retained by the end of denoising (e.g., last few steps). Such loss of knowledge eventually leads to inaccurate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Image and Video Retrieval Techniques · Computer Graphics and Visualization Techniques
MethodsDiffusion
