Scalable Autoregressive Image Generation with Mamba
Haopeng Li, Jinyue Yang, Kexin Wang, Xuerui Qiu, Yuhong Chou, Xin Li, Guoqi Li

TL;DR
AiM introduces a scalable autoregressive image generation model based on the Mamba architecture, achieving superior quality and faster inference by leveraging long-sequence modeling without complex 2D adaptations.
Contribution
The paper presents AiM, a novel autoregressive image generator using Mamba, which simplifies 2D image modeling and outperforms existing AR models in quality and speed.
Findings
Achieves state-of-the-art FID of 2.21 on ImageNet1K 256x256.
Offers models ranging from 148M to 1.3B parameters.
Demonstrates 2 to 10 times faster inference than diffusion models.
Abstract
We introduce AiM, an autoregressive (AR) image generative model based on Mamba architecture. AiM employs Mamba, a novel state-space model characterized by its exceptional performance for long-sequence modeling with linear time complexity, to supplant the commonly utilized Transformers in AR image generation models, aiming to achieve both superior generation quality and enhanced inference speed. Unlike existing methods that adapt Mamba to handle two-dimensional signals via multi-directional scan, AiM directly utilizes the next-token prediction paradigm for autoregressive image generation. This approach circumvents the need for extensive modifications to enable Mamba to learn 2D spatial representations. By implementing straightforward yet strategically targeted modifications for visual generative tasks, we preserve Mamba's core structure, fully exploiting its efficient long-sequence…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Advanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques
MethodsMamba: Linear-Time Sequence Modeling with Selective State Spaces · Diffusion
