TL;DR
This paper systematically investigates the impact of quantization on ResNet models, revealing that 4-bit quantization often offers the best tradeoff between compute cost and accuracy, and achieves state-of-the-art results on ImageNet.
Contribution
The study demonstrates that 4-bit quantized ResNet models outperform higher-precision models in cost-quality tradeoffs and provides a practical, hardware-aware quantization method with an open-source library.
Findings
4-bit quantization yields the best Pareto tradeoff curve.
State-of-the-art 77.09% top-1 accuracy on ImageNet for 4-bit ResNet-50.
Quantization acts as a regularizer, reducing the generalization gap.
Abstract
Quantization has become a popular technique to compress neural networks and reduce compute cost, but most prior work focuses on studying quantization without changing the network size. Many real-world applications of neural networks have compute cost and memory budgets, which can be traded off with model quality by changing the number of parameters. In this work, we use ResNet as a case study to systematically investigate the effects of quantization on inference compute cost-quality tradeoff curves. Our results suggest that for each bfloat16 ResNet model, there are quantized models with lower cost and higher accuracy; in other words, the bfloat16 compute cost-quality tradeoff curve is Pareto-dominated by the 4-bit and 8-bit curves, with models primarily quantized to 4-bit yielding the best Pareto curve. Furthermore, we achieve state-of-the-art results on ImageNet for 4-bit ResNet-50…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Methods1x1 Convolution · Average Pooling · Batch Normalization · Residual Connection · Kaiming Initialization · Residual Block · Global Average Pooling · *Communicated@Fast*How Do I Communicate to Expedia? · Bottleneck Residual Block · Max Pooling
