From Sequential to Spatial: Reordering Autoregression for Efficient Visual Generation
Siyang Wang, Hanting Li, Wei Li, Jie Hu, Xinghao Chen, Feng Zhao

TL;DR
RadAR introduces a radial, parallel prediction framework with nested attention to accelerate autoregressive visual generation, maintaining spatial coherence and reducing inference time.
Contribution
The paper proposes RadAR, a novel radial topology-based parallel autoregressive framework with nested attention, enhancing efficiency while preserving visual scene structure.
Findings
RadAR achieves faster inference compared to traditional autoregressive models.
The nested attention mechanism improves prediction consistency.
RadAR maintains high-quality visual generation with increased parallelization.
Abstract
Inspired by the remarkable success of autoregressive models in language modeling, this paradigm has been widely adopted in visual generation. However, the sequential token-by-token decoding mechanism inherent in traditional autoregressive models leads to low inference efficiency.In this paper, we propose RadAR, an efficient and parallelizable framework designed to accelerate autoregressive visual generation while preserving its representational capacity. Our approach is motivated by the observation that visual tokens exhibit strong local dependencies and spatial correlations with their neighbors--a property not fully exploited in standard raster-scan decoding orders. Specifically, we organize the generation process around a radial topology: an initial token is selected as the starting point, and all other tokens are systematically grouped into multiple concentric rings according to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Data Visualization and Analytics
