MGRQ: Post-Training Quantization For Vision Transformer With Mixed   Granularity Reconstruction

Lianwei Yang; Zhikai Li; Junrui Xiao; Haisong Gong; Qingyi Gu

arXiv:2406.09229·cs.CV·June 14, 2024

MGRQ: Post-Training Quantization For Vision Transformer With Mixed Granularity Reconstruction

Lianwei Yang, Zhikai Li, Junrui Xiao, Haisong Gong, Qingyi Gu

PDF

Open Access

TL;DR

This paper introduces MGRQ, a novel mixed granularity reconstruction method for post-training quantization of Vision Transformers, significantly improving accuracy especially in low-bit scenarios by employing global and local supervision strategies.

Contribution

MGRQ is the first to apply mixed granularity reconstruction with global and local supervision to enhance PTQ performance in Vision Transformers.

Findings

01

MGRQ outperforms existing methods in low-bit quantization accuracy.

02

Global and local supervision strategies effectively reduce quantization errors.

03

Extensive experiments validate the robustness and practicality of MGRQ.

Abstract

Post-training quantization (PTQ) efficiently compresses vision models, but unfortunately, it accompanies a certain degree of accuracy degradation. Reconstruction methods aim to enhance model performance by narrowing the gap between the quantized model and the full-precision model, often yielding promising results. However, efforts to significantly improve the performance of PTQ through reconstruction in the Vision Transformer (ViT) have shown limited efficacy. In this paper, we conduct a thorough analysis of the reasons for this limited effectiveness and propose MGRQ (Mixed Granularity Reconstruction Quantization) as a solution to address this issue. Unlike previous reconstruction schemes, MGRQ introduces a mixed granularity reconstruction approach. Specifically, MGRQ enhances the performance of PTQ by introducing Extra-Block Global Supervision and Intra-Block Local Supervision,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Processing Techniques and Applications · CCD and CMOS Imaging Sensors · Image and Signal Denoising Methods

MethodsResidual Connection · Softmax · Layer Normalization · Byte Pair Encoding · Label Smoothing · Adam · Attention Is All You Need · Linear Layer · Multi-Head Attention · Position-Wise Feed-Forward Layer