From "What" to "How": Constrained Reasoning for Autoregressive Image Generation

Ruxue Yan; Xubo Liu; Wenya Guo; Zhengkun Zhang; Ying Zhang; Xiaojie Yuan

arXiv:2603.02712·cs.CV·March 4, 2026

From "What" to "How": Constrained Reasoning for Autoregressive Image Generation

Ruxue Yan, Xubo Liu, Wenya Guo, Zhengkun Zhang, Ying Zhang, Xiaojie Yuan

PDF

Open Access

TL;DR

This paper introduces CoR-Painter, a new framework for autoregressive image generation that incorporates constrained reasoning to explicitly model 'how' to structure images, resulting in more coherent and spatially accurate images.

Contribution

It proposes a novel 'How-to-What' paradigm with constrained reasoning and a dual-objective strategy to improve spatial coherence and overall quality in autoregressive image synthesis.

Findings

01

Achieves state-of-the-art performance on multiple benchmarks.

02

Significant improvements in spatial metrics, e.g., +5.41% on T2I-CompBench.

03

Enhances image coherence and structural accuracy.

Abstract

Autoregressive image generation has seen recent improvements with the introduction of chain-of-thought and reinforcement learning. However, current methods merely specify "What" details to depict by rewriting the input prompt, yet fundamentally fail to reason about "How" to structure the overall image. This inherent limitation gives rise to persistent issues, such as spatial ambiguity directly causing unrealistic object overlaps. To bridge this gap, we propose CoR-Painter, a novel framework that pioneers a "How-to-What" paradigm by introducing Constrained Reasoning to guide the autoregressive generation. Specifically, it first deduces "How to draw" by deriving a set of visual constraints from the input prompt, which explicitly govern spatial relationships, key attributes, and compositional rules. These constraints steer the subsequent generation of a detailed description "What to draw",…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning