Unlocking Complex Visual Generation via Closed-Loop Verified Reasoning

Hanbo Cheng; Limin Lin; Ruo Zhang; Yicheng Pan; Jun Du

arXiv:2605.14876·cs.CV·May 18, 2026

Unlocking Complex Visual Generation via Closed-Loop Verified Reasoning

Hanbo Cheng, Limin Lin, Ruo Zhang, Yicheng Pan, Jun Du

PDF

TL;DR

The paper introduces CLVR, a novel framework that combines visual-language reasoning with pixel-level diffusion to improve complex text-to-image generation, addressing current limitations in planning, verification, and latency.

Contribution

CLVR offers a comprehensive system integrating visual verification, reinforcement learning, and weight merging to enhance reasoning, stability, and efficiency in complex visual generation tasks.

Findings

01

CLVR outperforms existing open-source baselines on multiple benchmarks.

02

It approaches the performance of proprietary commercial models.

03

The framework enables scalable, test-time complex visual generation.

Abstract

Despite rapid advancements, current text-to-image (T2I) models predominantly rely on a single-step generation paradigm, which struggles with complex semantics and faces diminishing returns from parameter scaling. While recent multi-step reasoning approaches show promise, they are hindered by ungrounded planning hallucinations lacking verification, monolithic post-hoc reflection, long-context optimization instabilities, and prohibitive inference latency. To overcome these bottlenecks, we propose the Closed-Loop Visual Reasoning (CLVR) framework, a comprehensive system that deeply couples visual-language logical planning with pixel-level diffusion generation. CLVR introduces an automated data engine with step-level visual verification to synthesize reliable reasoning trajectories, and proposes Proxy Prompt Reinforcement Learning (PPRL) to resolve long-context optimization instabilities by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.