M-VAR: Decoupled Scale-wise Autoregressive Modeling for High-Quality Image Generation
Sucheng Ren, Yaodong Yu, Nataniel Ruiz, Feng Wang, Alan Yuille, Cihang, Xie

TL;DR
This paper introduces M-VAR, a decoupled scale-wise autoregressive model for high-quality image generation that improves efficiency and performance by separating intra-scale and inter-scale modeling with specialized mechanisms.
Contribution
The paper proposes M-VAR, a novel framework that decouples scale-wise autoregressive modeling into intra-scale and inter-scale components, enhancing efficiency and image quality.
Findings
Outperforms existing models in image quality and speed.
Achieves 1.78 FID on ImageNet 256x256 with fewer parameters.
Surpasses prior autoregressive and diffusion models in benchmarks.
Abstract
There exists recent work in computer vision, named VAR, that proposes a new autoregressive paradigm for image generation. Diverging from the vanilla next-token prediction, VAR structurally reformulates the image generation into a coarse to fine next-scale prediction. In this paper, we show that this scale-wise autoregressive framework can be effectively decoupled into \textit{intra-scale modeling}, which captures local spatial dependencies within each scale, and \textit{inter-scale modeling}, which models cross-scale relationships progressively from coarse-to-fine scales. This decoupling structure allows to rebuild VAR in a more computationally efficient manner. Specifically, for intra-scale modeling -- crucial for generating high-fidelity images -- we retain the original bidirectional self-attention design to ensure comprehensive modeling; for inter-scale modeling, which semantically…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques · AI in cancer detection
MethodsMamba: Linear-Time Sequence Modeling with Selective State Spaces · Diffusion
