Markovian Scale Prediction: A New Era of Visual Autoregressive Generation
Yu Zhang, Jingyi Liu, Yiwei Shi, Qi Zhang, Duoqian Miao, Changwei Wang, Longbing Cao

TL;DR
This paper introduces Markov-VAR, a new visual autoregressive model that improves efficiency and performance by reformulating the process as a Markovian system with a sliding window, reducing computational costs while maintaining effectiveness.
Contribution
The paper proposes Markov-VAR, a novel non-full-context Markov process formulation for visual autoregressive generation that enhances efficiency and scalability over traditional VAR models.
Findings
Reduces FID by 10.5% on ImageNet
Decreases peak memory consumption by 83.8%
Achieves comparable or better generation quality with less computational cost
Abstract
Visual AutoRegressive modeling (VAR) based on next-scale prediction has revitalized autoregressive visual generation. Although its full-context dependency, i.e., modeling all previous scales for next-scale prediction, facilitates more stable and comprehensive representation learning by leveraging complete information flow, the resulting computational inefficiency and substantial overhead severely hinder VAR's practicality and scalability. This motivates us to develop a new VAR model with better performance and efficiency without full-context dependency. To address this, we reformulate VAR as a non-full-context Markov process, proposing Markov-VAR. It is achieved via Markovian Scale Prediction: we treat each scale as a Markov state and introduce a sliding window that compresses certain previous scales into a compact history vector to compensate for historical information loss owing to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Generative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning
