Computation Error Analysis of Block Floating Point Arithmetic Oriented Convolution Neural Network Accelerator Design
Zhourui Song, Zhenyu Liu, Dongsheng Wang

TL;DR
This paper investigates the impact of block floating point arithmetic on CNN accuracy and efficiency, demonstrating that 8-bit mantissa BFP can maintain high accuracy with minimal loss and providing theoretical error bounds for CNN accelerator design.
Contribution
It verifies the effects of BFP word width on CNN performance without retraining and develops a theoretical noise-to-signal ratio bound for BFP-based CNN accelerators.
Findings
8-bit mantissa BFP causes less than 0.3% accuracy loss
Theoretical NSR upper bound guides BFP CNN design
BFP reduces hardware cost and data traffic
Abstract
The heavy burdens of computation and off-chip traffic impede deploying the large scale convolution neural network on embedded platforms. As CNN is attributed to the strong endurance to computation errors, employing block floating point (BFP) arithmetics in CNN accelerators could save the hardware cost and data traffics efficiently, while maintaining the classification accuracy. In this paper, we verify the effects of word width definitions in BFP to the CNN performance without retraining. Several typical CNN models, including VGG16, ResNet-18, ResNet-50 and GoogLeNet, were tested in this paper. Experiments revealed that 8-bit mantissa, including sign bit, in BFP representation merely induced less than 0.3% accuracy loss. In addition, we investigate the computational errors in theory and develop the noise-to-signal ratio (NSR) upper bound, which provides the promising guidance for BFP…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Memory and Neural Computing · Advanced Neural Network Applications · Neural Networks and Applications
Methods1x1 Convolution · Average Pooling · Local Response Normalization · Auxiliary Classifier · Inception Module · *Communicated@Fast*How Do I Communicate to Expedia? · Dropout · Dense Connections · Max Pooling · Softmax
