Detection-Driven Object Count Optimization for Text-to-Image Diffusion Models
Oz Zafar, Yuval Cohen, Lior Wolf, Idan Schwartz

TL;DR
This paper introduces a novel framework for controlling object counts in text-to-image diffusion models by leveraging pre-trained counting and detection models, improving robustness, efficiency, and accuracy without extensive retraining.
Contribution
The authors propose a detection-driven optimization method that enhances object count control in diffusion models, addressing limitations of prior supervised and iterative approaches.
Findings
Improved object counting accuracy across diverse categories.
Enhanced robustness to viewpoint and proportion variations.
Reduced computational cost through token reuse.
Abstract
Accurately controlling object count in text-to-image generation remains a key challenge. Supervised methods often fail, as training data rarely covers all count variations. Methods that manipulate the denoising process to add or remove objects can help; however, they still require labeled data, limit robustness and image quality, and rely on a slow, iterative process. Pre-trained differentiable counting models that rely on soft object density summation exist and could steer generation, but employing them presents three main challenges: (i) they are pre-trained on clean images, making them less effective during denoising steps that operate on noisy inputs; (ii) they are not robust to viewpoint changes; and (iii) optimization is computationally expensive, requiring repeated model evaluations per image. We propose a new framework that uses pre-trained object counting techniques and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Data Management and Algorithms · Advanced Image and Video Retrieval Techniques
MethodsDiffusion
