Trainable Fixed-Point Quantization for Deep Learning Acceleration on FPGAs
Dingyi Dai, Yichi Zhang, Jiahao Zhang, Zhanqiu Hu, Yaohui Cai, Qi Sun,, Zhiru Zhang

TL;DR
This paper introduces QFX, a trainable fixed-point quantization method for deep learning on FPGAs that learns binary-point positions during training and reduces hardware complexity, leading to improved accuracy and efficiency.
Contribution
QFX is a novel, differentiable, trainable fixed-point quantization approach that automatically learns binary-point positions and minimizes DSP usage for FPGA deployment.
Findings
QFX achieves higher accuracy than post-training quantization on CIFAR-10 and ImageNet.
QFX enables models to be deployed with minimal effort and hardware overhead.
Multiplier-free quantization reduces DSP usage in FPGA accelerators.
Abstract
Quantization is a crucial technique for deploying deep learning models on resource-constrained devices, such as embedded FPGAs. Prior efforts mostly focus on quantizing matrix multiplications, leaving other layers like BatchNorm or shortcuts in floating-point form, even though fixed-point arithmetic is more efficient on FPGAs. A common practice is to fine-tune a pre-trained model to fixed-point for FPGA deployment, but potentially degrading accuracy. This work presents QFX, a novel trainable fixed-point quantization approach that automatically learns the binary-point position during model training. Additionally, we introduce a multiplier-free quantization strategy within QFX to minimize DSP usage. QFX is implemented as a PyTorch-based library that efficiently emulates fixed-point arithmetic, supported by FPGA HLS, in a differentiable manner during backpropagation. With minimal effort,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCCD and CMOS Imaging Sensors · Image Processing Techniques and Applications · Advanced Image Processing Techniques
MethodsLib · Focus
