Generation Navigator: A State-Aware Agentic Framework for Image Generation
Jinming Liu, Ruoyu Feng, Yuqi Wang, Wenjun Zeng, Xin Jin

TL;DR
This paper introduces Generation Navigator, a state-aware agentic framework for image generation that learns to adaptively steer the process through multi-turn interactions, improving fidelity to user intent.
Contribution
It proposes a novel reinforcement learning approach, PRE-GRPO, to train a dynamic, multi-turn T2I agent that effectively balances image quality, trajectory retention, and turn efficiency.
Findings
Achieves a WISE score of 0.90 on benchmarks.
Reaches 79.06% reasoning accuracy on T2I-ReasonBench.
Substantially outperforms existing methods in multi-turn image generation.
Abstract
Despite rapid advances in text-to-image generation, faithfully realizing user intent remains challenging, often requiring manual multi-turn trial and error. To automate this process, existing systems rely on either simple prompt rewriting or closed-loop agents driven by hand-crafted rules, rather than learning to adapt actions to the evolving generation process. In this paper, we reformulate image generation as a state-conditioned action-making problem and propose Generation Navigator, a multi-turn T2I agent that learns to dynamically steer the generation trajectory and output the next action. However, training this agent via reinforcement learning introduces a critical credit assignment challenge: naively rewarding a trajectory based solely on a single state assigns equal credit to all actions in the rollout, ignores the quality dynamics across turns, and fails to distinguish actions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
