ActVAR: Activating Mixtures of Weights and Tokens for Efficient Visual Autoregressive Generation
Kaixin Zhang, Ruiqing Yang, Yuan Zhang, Shan You, Tao Huang

TL;DR
ActVAR introduces a dynamic activation framework for visual autoregressive models that selectively activates model weights and tokens, significantly reducing computational costs while maintaining performance.
Contribution
It proposes a novel dual sparsity approach with learnable routing and token selection, enabling efficient image generation without degrading model capacity.
Findings
Achieves up to 21.2% FLOPs reduction on ImageNet 256x256.
Maintains high-quality image generation with minimal performance loss.
Employs a two-stage distillation to align routing with pretrained models.
Abstract
Visual Autoregressive (VAR) models enable efficient image generation via next-scale prediction but face escalating computational costs as sequence length grows. Existing static pruning methods degrade performance by permanently removing weights or tokens, disrupting pretrained dependencies. To address this, we propose ActVAR, a dynamic activation framework that introduces dual sparsity across model weights and token sequences to enhance efficiency without sacrificing capacity. ActVAR decomposes feedforward networks (FFNs) into lightweight expert sub-networks and employs a learnable router to dynamically select token-specific expert subsets based on content. Simultaneously, a gated token selector identifies high-update-potential tokens for computation while reconstructing unselected tokens to preserve global context and sequence alignment. Training employs a two-stage knowledge…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications · Face recognition and analysis
