CPT-V: A Contrastive Approach to Post-Training Quantization of Vision   Transformers

Natalia Frumkin; Dibakar Gope; and Diana Marculescu

arXiv:2211.09643·cs.CV·January 10, 2023·1 cites

CPT-V: A Contrastive Approach to Post-Training Quantization of Vision Transformers

Natalia Frumkin, Dibakar Gope, and Diana Marculescu

PDF

Open Access

TL;DR

CPT-V introduces a contrastive, self-supervised method to enhance post-training quantization accuracy of vision transformers by perturbing scales and using a block-wise evolutionary search with minimal calibration data.

Contribution

It proposes a novel contrastive loss-based approach combined with evolutionary search to improve quantized vision transformer performance without retraining.

Findings

01

Improves ViT-Base top-1 accuracy by up to 10.30% at 3-bit quantization.

02

Effective across various ViT architectures and quantization levels.

03

Requires only 1,000 calibration images for optimization.

Abstract

When considering post-training quantization, prior work has typically focused on developing a mixed precision scheme or learning the best way to partition a network for quantization. In our work, CPT-V, we look at a general way to improve the accuracy of networks that have already been quantized, simply by perturbing the quantization scales. Borrowing the idea of contrastive loss from self-supervised learning, we find a robust way to jointly minimize a loss function using just 1,000 calibration images. In order to determine the best performing quantization scale, CPT-V contrasts the features of quantized and full precision models in a self-supervised fashion. Unlike traditional reconstruction-based loss functions, the use of a contrastive loss function not only rewards similarity between the quantized and full precision outputs but also helps in distinguishing the quantized output…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCCD and CMOS Imaging Sensors · Image Enhancement Techniques · Visual Attention and Saliency Detection

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Softmax · Dense Connections · Layer Normalization · Residual Connection · Vision Transformer