Depth Adaptive Efficient Visual Autoregressive Modeling

Chunliang Li; Tianze Cao; Sanyuan Zhao

arXiv:2604.17286·cs.CV·April 21, 2026

Depth Adaptive Efficient Visual Autoregressive Modeling

Chunliang Li, Tianze Cao, Sanyuan Zhao

PDF

1 Repo

TL;DR

DepthVAR introduces an adaptive, computation-efficient framework for visual autoregressive modeling by dynamically allocating processing depth per token, significantly accelerating inference with minimal quality loss.

Contribution

It presents a novel, training-free method that adaptively assigns computational depth to tokens, surpassing traditional pruning techniques in efficiency and quality.

Findings

01

Achieves 2.3× to 3.1× acceleration with minimal quality loss.

02

Outperforms existing hard-pruning methods in compute-performance trade-offs.

03

Demonstrates effectiveness across high-resolution image generation tasks.

Abstract

Visual Autoregressive (VAR) modeling inefficiently applies a fixed computational depth to each position when generating high-resolution images. While existing methods accelerate inference by pruning tokens using frequency maps, their binary hard-pruning approach is fundamentally limited and fails to improve quality even with better frequency estimation. Observing that VAR models possess significant depth redundancy, we propose a paradigm shift from pruning entire tokens to adaptively allocating per-token computational depth. To this end, we introduce DepthVAR, a training-free framework that dynamically allocates computation. It integrates an adaptive depth scheduler, which assigns computational depth via a cyclic rotated schedule for balanced, non-static refinement, with a dynamic inference process that translates these depths into layer-major masks, selectively applies transformer…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

STOVAGtz/DepthVAR
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.