Partitioning Compute Units in CNN Acceleration for Statistical Memory Traffic Shaping
Daejin Jung, Sunjung Lee, Wonjong Rhee, Jung Ho Ahn

TL;DR
This paper introduces a novel compute unit partitioning strategy for CNN accelerators that smooths memory traffic fluctuations, reducing bandwidth bottlenecks and improving performance by 8% on a 64-core processor.
Contribution
It proposes a statistical memory traffic shaping method through asynchronous partitioning of compute units to mitigate bandwidth issues in CNN acceleration.
Findings
Achieves 8% performance improvement on ResNet-50
Reduces memory traffic fluctuations by statistical smoothing
Demonstrates effectiveness on a commercial 64-core processor
Abstract
The design complexity of CNNs has been steadily increasing to improve accuracy. To cope with the massive amount of computation needed for such complex CNNs, the latest solutions utilize blocking of an image over the available dimensions and batching of multiple input images to improve data reuse in the memory hierarchy. While there has been numerous works on maximizing data reuse, only a few studies have focused on the memory bottleneck caused by limited bandwidth. Bandwidth bottleneck can easily occur in CNN acceleration as CNN layers have different sizes with varying computation needs and as batching is typically performed over each CNN layer for an ideal data reuse. In this case, the data transfer demand for a layer can be relatively low or high compared to the computation requirement of the layer, and hence temporal fluctuations in memory access can be induced eventually causing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
