Analysis of Quantization on MLP-based Vision Models
Lingran Zhao, Zhen Dong, Kurt Keutzer

TL;DR
This paper investigates the challenges of applying quantization to MLP-based vision models and proposes techniques to maintain accuracy, achieving high performance with low-bit quantization.
Contribution
The paper introduces specific methods like LayerNorm, bounded activations, percentile quantization, and improved modules to effectively quantize MLP-based models without significant accuracy loss.
Findings
Achieves 79.68% accuracy on ImageNet with 8-bit quantization.
Maintains 78.47% accuracy with 4-bit quantization.
Proposes techniques that mitigate quantization-induced accuracy degradation.
Abstract
Quantization is wildly taken as a model compression technique, which obtains efficient models by converting floating-point weights and activations in the neural network into lower-bit integers. Quantization has been proven to work well on convolutional neural networks and transformer-based models. Despite the decency of these models, recent works have shown that MLP-based models are able to achieve comparable results on various tasks ranging from computer vision, NLP to 3D point cloud, while achieving higher throughput due to the parallelism and network simplicity. However, as we show in the paper, directly applying quantization to MLP-based models will lead to significant accuracy degradation. Based on our analysis, two major issues account for the accuracy gap: 1) the range of activations in MLP-based models can be too large to quantize, and 2) specific components in the MLP-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Infrared Target Detection Methodologies · Advanced Vision and Imaging
