Bitwise Systolic Array Architecture for Runtime-Reconfigurable Multi-precision Quantized Multiplication on Hardware Accelerators
Yuhao Liu, Salim Ullah, Akash Kumar

TL;DR
This paper introduces a runtime-reconfigurable bitwise systolic array architecture for multi-precision quantized neural network acceleration, enabling faster inference and higher clock frequencies on FPGA.
Contribution
It presents a novel hardware design that supports dynamic precision reconfiguration for QNNs, addressing limitations of previous fixed-precision accelerators.
Findings
Achieves 1.3185 to 3.5671 times speedup in mixed-precision inference
Supports higher clock frequency of 250MHz
Reduces critical path delay for improved performance
Abstract
Neural network accelerators have been widely applied to edge devices for complex tasks like object tracking, image recognition, etc. Previous works have explored the quantization technologies in related lightweight accelerator designs to reduce hardware resource consumption. However, low precision leads to high accuracy loss in inference. Therefore, mixed-precision quantization becomes an alternative solution by applying different precision in different layers to trade off resource consumption and accuracy. Because regular designs for multiplication on hardware cannot support the precision reconfiguration for a multi-precision Quantized Neural Network (QNN) model in runtime, we propose a runtime reconfigurable multi-precision multi-channel bitwise systolic array design for QNN accelerators. We have implemented and evaluated our work on the Ultra96 FPGA platform. Results show that our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Numerical Methods and Algorithms · Network Packet Processing and Optimization
