Autoregressive Image Generation with Linear Complexity: A Spatial-Aware Decay Perspective
Yuxin Mao, Zhen Qin, Jinxing Zhou, Hui Deng, Xuyang Shen, Bin Fan, Jing Zhang, Yiran Zhong, Yuchao Dai

TL;DR
This paper introduces LASAD, a novel linear attention mechanism that preserves spatial relationships in images, enabling efficient autoregressive image generation with high quality and reduced computational complexity.
Contribution
The paper proposes LASAD, a spatial-aware decay attention mechanism that maintains 2D spatial relationships, improving linear attention's effectiveness in image generation.
Findings
LASADGen achieves state-of-the-art performance on ImageNet.
LASAD reduces computational complexity to linear scale.
The method effectively captures long-range dependencies in images.
Abstract
Autoregressive (AR) models have garnered significant attention in image generation for their ability to effectively capture both local and global structures within visual data. However, prevalent AR models predominantly rely on the transformer architectures, which are beset by quadratic computational complexity concerning input sequence length and substantial memory overhead due to the necessity of maintaining key-value caches. Although linear attention mechanisms have successfully reduced this burden in language models, our initial experiments reveal that they significantly degrade image generation quality because of their inability to capture critical long-range dependencies in visual data. We propose Linear Attention with Spatial-Aware Decay (LASAD), a novel attention mechanism that explicitly preserves genuine 2D spatial relationships within the flattened image sequences by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
