MixA-Q: Revisiting Activation Sparsity for Vision Transformers from a Mixed-Precision Quantization Perspective

Weitian Wang; Rai Shubham; Cecilia De La Parra; Akash Kumar

arXiv:2507.19131·cs.CV·July 28, 2025

MixA-Q: Revisiting Activation Sparsity for Vision Transformers from a Mixed-Precision Quantization Perspective

Weitian Wang, Rai Shubham, Cecilia De La Parra, Akash Kumar

PDF

Open Access

TL;DR

MixA-Q introduces a mixed-precision activation quantization method for vision transformers that leverages activation sparsity to enhance efficiency with minimal accuracy loss, applicable to various quantization techniques.

Contribution

It proposes a novel intra-layer sparsity-aware quantization framework with a Two-Branch Swin Block for improved speed and accuracy in vision transformer inference.

Findings

01

Achieves 1.35x speedup without accuracy loss in PTQ.

02

Attains 1.25x speedup with no accuracy loss and 1.53x with slight accuracy drop in QAT.

03

Improves W4A4 model mAP by 0.7%, reducing quantization degradation by 24%.

Abstract

In this paper, we propose MixA-Q, a mixed-precision activation quantization framework that leverages intra-layer activation sparsity (a concept widely explored in activation pruning methods) for efficient inference of quantized window-based vision transformers. For a given uniform-bit quantization configuration, MixA-Q separates the batched window computations within Swin blocks and assigns a lower bit width to the activations of less important windows, improving the trade-off between model performance and efficiency. We introduce a Two-Branch Swin Block that processes activations separately in high- and low-bit precision, enabling seamless integration of our method with most quantization-aware training (QAT) and post-training quantization (PTQ) methods, or with simple modifications. Our experimental evaluations over the COCO dataset demonstrate that MixA-Q achieves a training-free…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTransition Metal Oxide Nanomaterials · CCD and CMOS Imaging Sensors · Advanced Neural Network Applications