Partitioning Compute Units in CNN Acceleration for Statistical Memory   Traffic Shaping

Daejin Jung; Sunjung Lee; Wonjong Rhee; Jung Ho Ahn

arXiv:1806.06541·cs.DC·June 19, 2018

Partitioning Compute Units in CNN Acceleration for Statistical Memory Traffic Shaping

Daejin Jung, Sunjung Lee, Wonjong Rhee, Jung Ho Ahn

PDF

TL;DR

This paper introduces a novel compute unit partitioning strategy for CNN accelerators that smooths memory traffic fluctuations, reducing bandwidth bottlenecks and improving performance by 8% on a 64-core processor.

Contribution

It proposes a statistical memory traffic shaping method through asynchronous partitioning of compute units to mitigate bandwidth issues in CNN acceleration.

Findings

01

Achieves 8% performance improvement on ResNet-50

02

Reduces memory traffic fluctuations by statistical smoothing

03

Demonstrates effectiveness on a commercial 64-core processor

Abstract

The design complexity of CNNs has been steadily increasing to improve accuracy. To cope with the massive amount of computation needed for such complex CNNs, the latest solutions utilize blocking of an image over the available dimensions and batching of multiple input images to improve data reuse in the memory hierarchy. While there has been numerous works on maximizing data reuse, only a few studies have focused on the memory bottleneck caused by limited bandwidth. Bandwidth bottleneck can easily occur in CNN acceleration as CNN layers have different sizes with varying computation needs and as batching is typically performed over each CNN layer for an ideal data reuse. In this case, the data transfer demand for a layer can be relatively low or high compared to the computation requirement of the layer, and hence temporal fluctuations in memory access can be induced eventually causing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.