FlexBlock: A Flexible DNN Training Accelerator with Multi-Mode Block Floating Point Support
Seock-Hwan Noh, Jahyun Koo, Seunghyun Lee, Jongse Park, Jaeha Kung

TL;DR
FlexBlock is a versatile DNN training accelerator supporting multiple block floating point precisions, significantly improving training speed and energy efficiency while maintaining accuracy across various neural network models.
Contribution
It introduces a flexible BFP-based training accelerator supporting multiple precisions and high core utilization for diverse layer types, unlike prior fixed-precision approaches.
Findings
Training speed improved by 1.5 to 5.3 times.
Energy efficiency increased by 2.4 to 7.0 times.
Achieved marginal accuracy loss compared to full-precision training.
Abstract
Training deep neural networks (DNNs) is a computationally expensive job, which can take weeks or months even with high performance GPUs. As a remedy for this challenge, community has started exploring the use of more efficient data representations in the training process, e.g., block floating point (BFP). However, prior work on BFP-based DNN accelerators rely on a specific BFP representation making them less versatile. This paper builds upon an algorithmic observation that we can accelerate the training by leveraging multiple BFP precisions without compromising the finally achieved accuracy. Backed up by this algorithmic opportunity, we develop a flexible DNN training accelerator, dubbed FlexBlock, which supports three different BFP precision modes, possibly different among activation, weight, and gradient tensors. While several prior works proposed such multi-precision support for DNN…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Medical Image Segmentation Techniques · Machine Learning and Data Classification
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
