DRACO: Co-Optimizing Hardware Utilization, and Performance of DNNs on   Systolic Accelerator

Nandan Kumar Jha; Shreyas Ravishankar; Sparsh Mittal; Arvind Kaushik,; Dipan Mandal; Mahesh Chandra

arXiv:2006.15103·eess.SP·June 29, 2020

DRACO: Co-Optimizing Hardware Utilization, and Performance of DNNs on Systolic Accelerator

Nandan Kumar Jha, Shreyas Ravishankar, Sparsh Mittal, Arvind Kaushik,, Dipan Mandal, Mahesh Chandra

PDF

TL;DR

DRACO is an algorithm-level co-optimization method that enhances PE utilization and energy efficiency for memory-bound DNNs on systolic accelerators without hardware modifications, also improving predictive performance.

Contribution

It introduces DRACO, a novel algorithm-level co-optimization approach that addresses PE underutilization and enhances DNN predictive accuracy on systolic arrays.

Findings

01

41.8% improvement in PE utilization

02

42.6% reduction in inference latency

03

Negligible loss in predictive performance

Abstract

The number of processing elements (PEs) in a fixed-sized systolic accelerator is well matched for large and compute-bound DNNs; whereas, memory-bound DNNs suffer from PE underutilization and fail to achieve peak performance and energy efficiency. To mitigate this, specialized dataflow and/or micro-architectural techniques have been proposed. However, due to the longer development cycle and the rapid pace of evolution in the deep learning fields, these hardware-based solutions can be obsolete and ineffective in dealing with PE underutilization for state-of-the-art DNNs. In this work, we address the challenge of PE underutilization at the algorithm front and propose data reuse aware co-optimization (DRACO). This improves the PE utilization of memory-bound DNNs without any additional need for dataflow/micro-architecture modifications. Furthermore, unlike the previous co-optimization…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsPointwise Convolution · Depthwise Convolution · Softmax · *Communicated@Fast*How Do I Communicate to Expedia? · Batch Normalization · Average Pooling · Depthwise Separable Convolution · 1x1 Convolution · Dense Connections · Global Average Pooling