PiCo: Enhancing Text-Image Alignment with Improved Noise Selection and   Precise Mask Control in Diffusion Models

Chang Xie; Chenyi Zhuang; Pan Gao

arXiv:2505.03203·cs.CV·May 7, 2025

PiCo: Enhancing Text-Image Alignment with Improved Noise Selection and Precise Mask Control in Diffusion Models

Chang Xie, Chenyi Zhuang, Pan Gao

PDF

Open Access

TL;DR

PiCo is a training-free method that improves text-image alignment in diffusion models by selecting high-quality noise and controlling masks to better guide image generation from complex text prompts.

Contribution

It introduces a novel noise selection and mask control approach that enhances text-image alignment without additional training.

Findings

01

Improves alignment accuracy for complex prompts

02

Reduces reliance on random noise quality

03

Enhances control over generated images

Abstract

Advanced diffusion models have made notable progress in text-to-image compositional generation. However, it is still a challenge for existing models to achieve text-image alignment when confronted with complex text prompts. In this work, we highlight two factors that affect this alignment: the quality of the randomly initialized noise and the reliability of the generated controlling mask. We then propose PiCo (Pick-and-Control), a novel training-free approach with two key components to tackle these two factors. First, we develop a noise selection module to assess the quality of the random noise and determine whether the noise is suitable for the target text. A fast sampling strategy is utilized to ensure efficiency in the noise selection stage. Second, we introduce a referring mask module to generate pixel-level masks and to precisely modulate the cross-attention maps. The referring…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Generative Adversarial Networks and Image Synthesis

MethodsDiffusion