CountDiffusion: Text-to-Image Synthesis with Training-Free   Counting-Guidance Diffusion

Yanyu Li; Pencheng Wan; Liang Han; Yaowei Wang; Liqiang Nie; Min Zhang

arXiv:2505.04347·cs.CV·May 8, 2025

CountDiffusion: Text-to-Image Synthesis with Training-Free Counting-Guidance Diffusion

Yanyu Li, Pencheng Wan, Liang Han, Yaowei Wang, Liqiang Nie, Min Zhang

PDF

Open Access

TL;DR

CountDiffusion is a training-free method that enhances text-to-image diffusion models to accurately generate images with the correct number of objects, using a two-stage counting and correction process.

Contribution

It introduces a novel training-free framework that improves object quantity accuracy in diffusion-based text-to-image synthesis without additional training.

Findings

01

Significantly improves object counting accuracy in generated images.

02

Can be integrated into existing diffusion models without retraining.

03

Demonstrates superior performance over baseline methods.

Abstract

Stable Diffusion has advanced text-to-image synthesis, but training models to generate images with accurate object quantity is still difficult due to the high computational cost and the challenge of teaching models the abstract concept of quantity. In this paper, we propose CountDiffusion, a training-free framework aiming at generating images with correct object quantity from textual descriptions. CountDiffusion consists of two stages. In the first stage, an intermediate denoising result is generated by the diffusion model to predict the final synthesized image with one-step denoising, and a counting model is used to count the number of objects in this image. In the second stage, a correction module is used to correct the object quantity by changing the attention map of the object with universal guidance. The proposed CountDiffusion can be plugged into any diffusion-based text-to-image…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Computer Graphics and Visualization Techniques · Model Reduction and Neural Networks

MethodsSoftmax · Attention Is All You Need · Diffusion