Agentic Flow Steering and Parallel Rollout Search for Spatially Grounded Text-to-Image Generation
Ping Chen, Daoxuan Zhang, Xiangming Wang, Yungeng Liu, Haijin Zeng, Yongyong Chen

TL;DR
This paper introduces AFS-Search, a training-free, closed-loop framework for spatially grounded text-to-image generation that improves accuracy and speed by dynamically steering and exploring multiple generation trajectories using a vision-language model as a semantic critic.
Contribution
The paper presents a novel training-free, closed-loop search framework with flow steering and parallel rollout, enhancing spatial grounding and semantic accuracy in T2I generation without additional training.
Findings
Achieves state-of-the-art results on three benchmarks.
Significantly improves performance of FLUX.1-dev.
Offers a faster variant with competitive results.
Abstract
Precise Text-to-Image (T2I) generation has achieved great success but is hindered by the limited relational reasoning of static text encoders and the error accumulation in open-loop sampling. Without real-time feedback, initial semantic ambiguities during the Ordinary Differential Equation trajectory inevitably escalate into stochastic deviations from spatial constraints. To bridge this gap, we introduce AFS-Search (Agentic Flow Steering and Parallel Rollout Search), a training-free closed-loop framework built upon FLUX.1-dev. AFS-Search incorporates a training-free closed-loop parallel rollout search and flow steering mechanism, which leverages a Vision-Language Model (VLM) as a semantic critic to diagnose intermediate latents and dynamically steer the velocity field via precise spatial grounding. Complementarily, we formulate T2I generation as a sequential decision-making process,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Topic Modeling
