StepVAR: Structure-Texture Guided Pruning for Visual Autoregressive Models

Keli Liu; Zhendong Wang; Wengang Zhou; Houqiang Li

arXiv:2603.01757·cs.CV·March 3, 2026

StepVAR: Structure-Texture Guided Pruning for Visual Autoregressive Models

Keli Liu, Zhendong Wang, Wengang Zhou, Houqiang Li

PDF

Open Access

TL;DR

StepVAR is a novel pruning method for visual autoregressive models that accelerates inference by jointly considering structural and textural importance, maintaining quality while reducing computational cost.

Contribution

We introduce a training-free token pruning framework that combines high-pass filtering and PCA to preserve both local textures and global structure in VAR models.

Findings

01

Achieves significant inference speedup in VAR models.

02

Maintains high-quality visual generation comparable to full models.

03

Outperforms existing acceleration methods across multiple datasets.

Abstract

Visual AutoRegressive (VAR) models based on next-scale prediction enable efficient hierarchical generation, yet the inference cost grows quadratically at high resolutions. We observe that the computationally intensive later scales predominantly refine high-frequency textures and exhibit substantial spatial redundancy, in contrast to earlier scales that determine the global structural layout. Existing pruning methods primarily focus on high-frequency detection for token selection, often overlooking structural coherence and consequently degrading global semantics. To address this limitation, we propose StepVAR, a training-free token pruning framework that accelerates VAR inference by jointly considering structural and textural importance. Specifically, we employ a lightweight high-pass filter to capture local texture details, while leveraging Principal Component Analysis (PCA) to preserve…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning