Boosting Text-to-Image Diffusion Models via Core Token Attention-Based Seed Selection

Yunzhe Zhang; Hongfu Liu; and Pengyu Hong

arXiv:2605.19532·cs.CV·May 20, 2026

Boosting Text-to-Image Diffusion Models via Core Token Attention-Based Seed Selection

Yunzhe Zhang, Hongfu Liu, and Pengyu Hong

PDF

TL;DR

This paper introduces ABSS, a seed selection method based on attention dynamics over prompt core tokens, which improves image quality and alignment in text-to-image diffusion models without additional training.

Contribution

The paper proposes a novel, training-free seed ranking technique using attention over core tokens, enhancing existing diffusion models' performance at inference time.

Findings

01

ABSS improves text-image alignment and visual quality across benchmarks.

02

It requires no fine-tuning or changes to initial noise.

03

ABSS consistently outperforms baseline seed selection methods.

Abstract

Text-to-image diffusion models can synthesize high-quality images, yet the outcome is notoriously sensitive to the random seed: different initial seeds often yield large variations in image quality and prompt-image alignment. We revisit this "seed effect" and show that attention dynamics over prompt core tokens, the content-bearing words, measured during the first few denoising steps, strongly predict final generation quality. Building on this observation, we introduce Attention-Based Seed Selection (ABSS), a training-free, plug-and-play method that ranks seeds for a given prompt by leveraging cross-attention to core tokens during the denoising process. ABSS requires no finetuning and does not alter the initial noise; it scores and ranks all candidate seeds, keeps only the top-k for full generation, and discards the rest, without relying on a fixed accept/reject threshold. Operating…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.