FlexBlock: A Flexible DNN Training Accelerator with Multi-Mode Block   Floating Point Support

Seock-Hwan Noh; Jahyun Koo; Seunghyun Lee; Jongse Park; Jaeha Kung

arXiv:2203.06673·cs.LG·March 15, 2022

FlexBlock: A Flexible DNN Training Accelerator with Multi-Mode Block Floating Point Support

Seock-Hwan Noh, Jahyun Koo, Seunghyun Lee, Jongse Park, Jaeha Kung

PDF

Open Access

TL;DR

FlexBlock is a versatile DNN training accelerator supporting multiple block floating point precisions, significantly improving training speed and energy efficiency while maintaining accuracy across various neural network models.

Contribution

It introduces a flexible BFP-based training accelerator supporting multiple precisions and high core utilization for diverse layer types, unlike prior fixed-precision approaches.

Findings

01

Training speed improved by 1.5 to 5.3 times.

02

Energy efficiency increased by 2.4 to 7.0 times.

03

Achieved marginal accuracy loss compared to full-precision training.

Abstract

Training deep neural networks (DNNs) is a computationally expensive job, which can take weeks or months even with high performance GPUs. As a remedy for this challenge, community has started exploring the use of more efficient data representations in the training process, e.g., block floating point (BFP). However, prior work on BFP-based DNN accelerators rely on a specific BFP representation making them less versatile. This paper builds upon an algorithmic observation that we can accelerate the training by leveraging multiple BFP precisions without compromising the finally achieved accuracy. Backed up by this algorithmic opportunity, we develop a flexible DNN training accelerator, dubbed FlexBlock, which supports three different BFP precision modes, possibly different among activation, weight, and gradient tensors. While several prior works proposed such multi-precision support for DNN…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Medical Image Segmentation Techniques · Machine Learning and Data Classification

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings