Progress by Pieces: Test-Time Scaling for Autoregressive Image Generation

Joonhyung Park; Hyeongwon Jang; Joowon Kim; Eunho Yang

arXiv:2511.21185·cs.CV·November 27, 2025

Progress by Pieces: Test-Time Scaling for Autoregressive Image Generation

Joonhyung Park, Hyeongwon Jang, Joowon Kim, Eunho Yang

PDF

Open Access

TL;DR

This paper introduces GridAR, a novel test-time scaling framework for autoregressive image generation that improves quality and efficiency by progressive, grid-based candidate generation and prompt reformulation.

Contribution

GridAR is a new test-time scaling method that enhances visual autoregressive models through grid-partitioned progressive generation and prompt reformulation strategies.

Findings

01

Outperforms Best-of-N with fewer computations.

02

Achieves 14.4% higher quality on T2I-CompBench++ with N=4.

03

Shows 13.9% improvement in semantic preservation in image editing.

Abstract

Recent visual autoregressive (AR) models have shown promising capabilities in text-to-image generation, operating in a manner similar to large language models. While test-time computation scaling has brought remarkable success in enabling reasoning-enhanced outputs for challenging natural language tasks, its adaptation to visual AR models remains unexplored and poses unique challenges. Naively applying test-time scaling strategies such as Best-of-N can be suboptimal: they consume full-length computation on erroneous generation trajectories, while the raster-scan decoding scheme lacks a blueprint of the entire canvas, limiting scaling benefits as only a few prompt-aligned candidates are generated. To address these, we introduce GridAR, a test-time scaling framework designed to elicit the best possible results from visual AR models. GridAR employs a grid-partitioned progressive generation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning