Markovian Scale Prediction: A New Era of Visual Autoregressive Generation

Yu Zhang; Jingyi Liu; Yiwei Shi; Qi Zhang; Duoqian Miao; Changwei Wang; Longbing Cao

arXiv:2511.23334·cs.CV·March 4, 2026

Markovian Scale Prediction: A New Era of Visual Autoregressive Generation

Yu Zhang, Jingyi Liu, Yiwei Shi, Qi Zhang, Duoqian Miao, Changwei Wang, Longbing Cao

PDF

Open Access

TL;DR

This paper introduces Markov-VAR, a new visual autoregressive model that improves efficiency and performance by reformulating the process as a Markovian system with a sliding window, reducing computational costs while maintaining effectiveness.

Contribution

The paper proposes Markov-VAR, a novel non-full-context Markov process formulation for visual autoregressive generation that enhances efficiency and scalability over traditional VAR models.

Findings

01

Reduces FID by 10.5% on ImageNet

02

Decreases peak memory consumption by 83.8%

03

Achieves comparable or better generation quality with less computational cost

Abstract

Visual AutoRegressive modeling (VAR) based on next-scale prediction has revitalized autoregressive visual generation. Although its full-context dependency, i.e., modeling all previous scales for next-scale prediction, facilitates more stable and comprehensive representation learning by leveraging complete information flow, the resulting computational inefficiency and substantial overhead severely hinder VAR's practicality and scalability. This motivates us to develop a new VAR model with better performance and efficiency without full-context dependency. To address this, we reformulate VAR as a non-full-context Markov process, proposing Markov-VAR. It is achieved via Markovian Scale Prediction: we treat each scale as a Markov state and introduce a sliding window that compresses certain previous scales into a compact history vector to compensate for historical information loss owing to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Generative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning