Benchmarking and Evolving Reason-Reflect-Rectify for Reflective Visual Generation

Junjie Wang; Xinghua Lou; Jason Li; Ye Tian; Keyu Chen; Yulin Li; Bin Kang; Jacky Mai; Yanwei Li; Zhuotao Tian; Liqiang Nie

arXiv:2605.19639·cs.CV·May 20, 2026

Benchmarking and Evolving Reason-Reflect-Rectify for Reflective Visual Generation

Junjie Wang, Xinghua Lou, Jason Li, Ye Tian, Keyu Chen, Yulin Li, Bin Kang, Jacky Mai, Yanwei Li, Zhuotao Tian, Liqiang Nie

PDF

1 Repo

TL;DR

This paper introduces a new framework and benchmark for iterative, reflective visual generation, addressing limitations of single-pass models in handling complex prompts through multi-round reasoning and rectification.

Contribution

It formalizes the R^3 loop for multi-round visual generation, creates R^3-Bench for evaluating reasoning and rectification, and proposes R^3-Refiner to improve model performance.

Findings

01

State-of-the-art models identify errors but cannot generate rectification instructions.

02

R^3-Refiner improves scores by 12% in Reflective Verdict and 9% in Rectification.

03

The framework enhances the quality of visual generation across multiple models.

Abstract

Text-to-Image (T2I) models and Unified Multimodal Models (UMMs) have achieved remarkable progress in visual generation. However, their reliance on a single-pass generation paradigm limits their ability to handle complex prompts requiring iterative refinement. To enable multi-round Reflective Visual Generation (RVG), we formalize the Reason-Reflect-Rectify (R^3) loop as a core framework and introduce R^3-Bench, a benchmark of over 600 expert-annotated instances that quantifies iterative reasoning and rectification capabilities. Evaluation on R^3-Bench reveals a critical gap: while state-of-the-art models can identify generation errors, they fail to generate actionable rectification instructions. To bridge this gap, we propose R^3-Refiner, a dual-stage framework leveraging Group Relative Policy Optimization (GRPO) and a Hierarchical Reward Mechanism (HRM) to better align rectification…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xiaomoguhz/R3-Bench
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.