QwT-v2: Practical, Effective and Efficient Post-Training Quantization
Ningyuan Tang, Minghao Fu, Hao Yu, Jianxin Wu

TL;DR
QwT-v2 is an improved post-training quantization method that reduces resource consumption, enhances compatibility with hardware, and maintains or improves accuracy through a lightweight channel-wise affine compensation module.
Contribution
QwT-v2 introduces a lightweight channel-wise affine compensation module that reduces extra parameters and computation, improving hardware compatibility and accuracy over the original QwT.
Findings
QwT-v2 matches or outperforms QwT in accuracy.
QwT-v2 significantly reduces extra parameters and latency.
QwT-v2 is compatible with most existing hardware platforms.
Abstract
Network quantization is arguably one of the most practical network compression approaches for reducing the enormous resource consumption of modern deep neural networks. They usually require diverse and subtle design choices for specific architecture and tasks. Instead, the QwT method is a simple and general approach which introduces lightweight additional structures to improve quantization. But QwT incurs extra parameters and latency. More importantly, QwT is not compatible with many hardware platforms. In this paper, we propose QwT-v2, which not only enjoys all advantages of but also resolves major defects of QwT. By adopting a very lightweight channel-wise affine compensation (CWAC) module, QwT-v2 introduces significantly less extra parameters and computations compared to QwT, and at the same time matches or even outperforms QwT in accuracy. The compensation module of QwT-v2 can be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMedical Imaging and Analysis · Medical Imaging Techniques and Applications
