PalQuant: Accelerating High-precision Networks on Low-precision Accelerators
Qinghao Hu, Gang Li, Qiman Wu, Jian Cheng

TL;DR
PalQuant is a novel method that enables high-precision neural networks to run efficiently on low-precision accelerators by learning parallel low-precision representations and using a cyclic shuffle module, improving accuracy and speed.
Contribution
The paper introduces PalQuant, a new approach for approximating high-precision computations with parallel low-precision representations and a cyclic shuffle module, enhancing performance on low-precision hardware.
Findings
PalQuant achieves 0.52% higher accuracy than existing methods.
PalQuant provides a 1.78× inference speedup on a 2-bit accelerator.
It outperforms state-of-the-art quantization methods in accuracy and efficiency.
Abstract
Recently low-precision deep learning accelerators (DLAs) have become popular due to their advantages in chip area and energy consumption, yet the low-precision quantized models on these DLAs bring in severe accuracy degradation. One way to achieve both high accuracy and efficient inference is to deploy high-precision neural networks on low-precision DLAs, which is rarely studied. In this paper, we propose the PArallel Low-precision Quantization (PalQuant) method that approximates high-precision computations via learning parallel low-precision representations from scratch. In addition, we present a novel cyclic shuffle module to boost the cross-group information communication between parallel low-precision groups. Extensive experiments demonstrate that PalQuant has superior performance to state-of-the-art quantization methods in both accuracy and inference speed, e.g., for ResNet-18…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
