On the Impact of Partial Sums on Interconnect Bandwidth and Memory   Accesses in a DNN Accelerator

Mahesh Chandra

arXiv:2011.00850·cs.AR·February 25, 2021

On the Impact of Partial Sums on Interconnect Bandwidth and Memory Accesses in a DNN Accelerator

Mahesh Chandra

PDF

TL;DR

This paper introduces an analytical method for partitioning feature maps in DNN accelerators to optimize bandwidth, demonstrating up to 40% reduction through optimal partitioning and active memory control.

Contribution

It proposes the first first-order analytical approach for feature map partitioning to minimize bandwidth in DNN accelerators.

Findings

01

Optimal partitioning reduces bandwidth by up to 40%.

02

Active memory controllers enable efficient bandwidth savings.

03

Analytical method provides a systematic partitioning strategy.

Abstract

Dedicated accelerators are being designed to address the huge resource requirement of the deep neural network (DNN) applications. The power, performance and area (PPA) constraints limit the number of MACs available in these accelerators. The convolution layers which require huge number of MACs are often partitioned into multiple iterative sub-tasks. This puts huge pressure on the available system resources such as interconnect and memory bandwidth. The optimal partitioning of the feature maps for these sub-tasks can reduce the bandwidth requirement substantially. Some accelerators avoid off-chip or interconnect transfers by implementing local memories; however, the memory accesses are still performed and a reduced bandwidth can help in saving power in such architectures. In this paper, we propose a first order analytical method to partition the feature maps for optimal bandwidth and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsConvolution