CountLoop: Training-Free High-Instance Image Generation via Iterative Agent Guidance

Anindya Mondal; Ayan Banerjee; Sauradip Nag; Josep Llados; Xiatian Zhu; Anjan Dutta

arXiv:2508.16644·cs.CV·April 14, 2026

CountLoop: Training-Free High-Instance Image Generation via Iterative Agent Guidance

Anindya Mondal, Ayan Banerjee, Sauradip Nag, Josep Llados, Xiatian Zhu, Anjan Dutta

PDF

1 Datasets

TL;DR

COUNTLOOP is a training-free method that uses iterative feedback from vision-language models to generate images with precise object counts and high spatial quality, especially in dense scenes.

Contribution

It introduces a novel training-free framework combining scene layout planning and feedback-driven refinement for high-instance image generation.

Findings

01

Reduces counting error by up to 57% on benchmarks.

02

Achieves highest or comparable spatial quality scores.

03

Maintains photorealism in densely occluded scenes.

Abstract

Diffusion models excel at photorealistic synthesis but struggle with precise object counts, especially in high-density settings. We introduce COUNTLOOP, a training-free framework that achieves precise instance control through iterative, structured feedback. Our method alternates between synthesis and evaluation: a VLM-based planner generates structured scene layouts, while a VLM-based critic provides explicit feedback on object counts, spatial arrangements, and visual quality to refine the layout iteratively. Instance-driven attention masking and cumulative attention composition further prevent semantic leakage, ensuring clear object separation even in densely occluded scenes. Evaluations on COCO-Count, T2I-CompBench, and two newly introduced high instance benchmarks show that COUNTLOOP reduces counting error by up to 57% and achieves the highest or comparable spatial quality scores…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

anindyamondal/COUNTLOOP
dataset· 11 dl
11 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.