On the Impact of Partial Sums on Interconnect Bandwidth and Memory Accesses in a DNN Accelerator
Mahesh Chandra

TL;DR
This paper introduces an analytical method for partitioning feature maps in DNN accelerators to optimize bandwidth, demonstrating up to 40% reduction through optimal partitioning and active memory control.
Contribution
It proposes the first first-order analytical approach for feature map partitioning to minimize bandwidth in DNN accelerators.
Findings
Optimal partitioning reduces bandwidth by up to 40%.
Active memory controllers enable efficient bandwidth savings.
Analytical method provides a systematic partitioning strategy.
Abstract
Dedicated accelerators are being designed to address the huge resource requirement of the deep neural network (DNN) applications. The power, performance and area (PPA) constraints limit the number of MACs available in these accelerators. The convolution layers which require huge number of MACs are often partitioned into multiple iterative sub-tasks. This puts huge pressure on the available system resources such as interconnect and memory bandwidth. The optimal partitioning of the feature maps for these sub-tasks can reduce the bandwidth requirement substantially. Some accelerators avoid off-chip or interconnect transfers by implementing local memories; however, the memory accesses are still performed and a reduced bandwidth can help in saving power in such architectures. In this paper, we propose a first order analytical method to partition the feature maps for optimal bandwidth and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsConvolution
