SpectralAR: Spectral Autoregressive Visual Generation
Yuanhui Huang, Weiliang Chen, Wenzhao Zheng, Yueqi Duan, Jie Zhou, Jiwen Lu

TL;DR
SpectralAR introduces a spectral-based autoregressive framework for visual generation that models images from low to high frequency components, achieving causality and efficiency in sequence modeling.
Contribution
It proposes a novel spectral tokenization method and autoregressive generation process that better captures causality in images compared to patch-based methods.
Findings
Achieves 3.02 gFID on ImageNet-1K with 64 tokens
Uses only 310M parameters for high-quality generation
Demonstrates effective coarse-to-fine spectral image reconstruction
Abstract
Autoregressive visual generation has garnered increasing attention due to its scalability and compatibility with other modalities compared with diffusion models. Most existing methods construct visual sequences as spatial patches for autoregressive generation. However, image patches are inherently parallel, contradicting the causal nature of autoregressive modeling. To address this, we propose a Spectral AutoRegressive (SpectralAR) visual generation framework, which realizes causality for visual sequences from the spectral perspective. Specifically, we first transform an image into ordered spectral tokens with Nested Spectral Tokenization, representing lower to higher frequency components. We then perform autoregressive generation in a coarse-to-fine manner with the sequences of spectral tokens. By considering different levels of detail in images, our SpectralAR achieves both sequence…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Image Enhancement Techniques · Domain Adaptation and Few-Shot Learning
MethodsSoftmax · Attention Is All You Need · Diffusion
