Learning to Expand Images for Efficient Visual Autoregressive Modeling

Ruiqing Yang; Kaixin Zhang; Zheng Zhang; Shan You; Tao Huang

arXiv:2511.15499·cs.CV·November 20, 2025

Learning to Expand Images for Efficient Visual Autoregressive Modeling

Ruiqing Yang, Kaixin Zhang, Zheng Zhang, Shan You, Tao Huang

PDF

Open Access

TL;DR

This paper introduces EAR, a biologically inspired image generation method that expands tokens from the center outward, enabling efficient parallel decoding and improved quality in autoregressive visual models.

Contribution

We propose EAR, a novel spiral expansion approach with adaptive decoding, improving efficiency and quality in autoregressive image generation.

Findings

01

Achieves state-of-the-art fidelity-efficiency trade-offs on ImageNet

02

Reduces computational cost compared to traditional token-by-token methods

03

Aligns generation order with perceptual relevance for better quality

Abstract

Autoregressive models have recently shown great promise in visual generation by leveraging discrete token sequences akin to language modeling. However, existing approaches often suffer from inefficiency, either due to token-by-token decoding or the complexity of multi-scale representations. In this work, we introduce Expanding Autoregressive Representation (EAR), a novel generation paradigm that emulates the human visual system's center-outward perception pattern. EAR unfolds image tokens in a spiral order from the center and progressively expands outward, preserving spatial continuity and enabling efficient parallel decoding. To further enhance flexibility and speed, we propose a length-adaptive decoding strategy that dynamically adjusts the number of tokens predicted at each step. This biologically inspired design not only reduces computational cost but also improves generation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning