A Simple Method to Reduce Off-chip Memory Accesses on Convolutional Neural Networks
Doyun Kim, Kyoung-Young Kim, Sangsoo Ko, Sanghyuck Ha

TL;DR
This paper proposes a simple algorithm that maximizes on-chip memory use in neural process units to significantly reduce off-chip memory accesses in convolutional neural networks, especially for complex modules like Inception-V3.
Contribution
The paper introduces a straightforward method to minimize off-chip memory accesses by optimizing on-chip memory utilization in neural processing units, effective for multi-branch modules.
Findings
Achieves 97.59% reduction in off-chip feature-map data transfer.
Reduces off-chip memory accesses by a factor of 50.
Effective for complex CNN modules like Inception-V3.
Abstract
For convolutional neural networks, a simple algorithm to reduce off-chip memory accesses is proposed by maximally utilizing on-chip memory in a neural process unit. Especially, the algorithm provides an effective way to process a module which consists of multiple branches and a merge layer. For Inception-V3 on Samsung's NPU in Exynos, our evaluation shows that the proposed algorithm makes off-chip memory accesses reduced by 1/50, and accordingly achieves 97.59 % reduction in the amount of feature-map data to be transferred from/to off-chip memory.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Memory and Neural Computing · Advanced Neural Network Applications · Ferroelectric and Negative Capacitance Devices
