Low Precision Constant Parameter CNN on FPGA
Thiam Khean Hah, Yeong Tat Liew, Jason Ong

TL;DR
This paper presents FPGA implementations of low-precision, sparse CNN convolution layers with constant parameters, achieving high performance and efficiency, surpassing GPU benchmarks for ResNet50 residual blocks.
Contribution
It introduces techniques for optimizing low-precision, sparse CNN layers on FPGA, including amortizing multiplication costs and leveraging dense LUT structures, applied to ResNet50.
Findings
Achieved 131 TOP/chip for corner case residual blocks.
Projected 10k images/sec/chip performance for full ResNet50.
Performance exceeds V100 GPU upper bound by 1.37x after sparsity normalization.
Abstract
We report FPGA implementation results of low precision CNN convolution layers optimized for sparse and constant parameters. We describe techniques that amortizes the cost of common factor multiplication and automatically leverage dense hand tuned LUT structures. We apply this method to corner case residual blocks of Resnet on a sparse Resnet50 model to assess achievable utilization and frequency and demonstrate an effective performance of 131 and 23 TOP/chip for the corner case blocks. The projected performance on a multichip persistent implementation of all Resnet50 convolution layers is 10k im/s/chip at batch size 2. This is 1.37x higher than V100 GPU upper bound at the same batch size after normalizing for sparsity.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Model Reduction and Neural Networks · Image Processing Techniques and Applications
MethodsAverage Pooling · *Communicated@Fast*How Do I Communicate to Expedia? · 1x1 Convolution · Batch Normalization · Bottleneck Residual Block · Global Average Pooling · Residual Block · Kaiming Initialization · Max Pooling · Residual Connection
