A Simple Method to Reduce Off-chip Memory Accesses on Convolutional   Neural Networks

Doyun Kim; Kyoung-Young Kim; Sangsoo Ko; Sanghyuck Ha

arXiv:1901.09614·cs.NE·January 29, 2019·5 cites

A Simple Method to Reduce Off-chip Memory Accesses on Convolutional Neural Networks

Doyun Kim, Kyoung-Young Kim, Sangsoo Ko, Sanghyuck Ha

PDF

Open Access

TL;DR

This paper proposes a simple algorithm that maximizes on-chip memory use in neural process units to significantly reduce off-chip memory accesses in convolutional neural networks, especially for complex modules like Inception-V3.

Contribution

The paper introduces a straightforward method to minimize off-chip memory accesses by optimizing on-chip memory utilization in neural processing units, effective for multi-branch modules.

Findings

01

Achieves 97.59% reduction in off-chip feature-map data transfer.

02

Reduces off-chip memory accesses by a factor of 50.

03

Effective for complex CNN modules like Inception-V3.

Abstract

For convolutional neural networks, a simple algorithm to reduce off-chip memory accesses is proposed by maximally utilizing on-chip memory in a neural process unit. Especially, the algorithm provides an effective way to process a module which consists of multiple branches and a merge layer. For Inception-V3 on Samsung's NPU in Exynos, our evaluation shows that the proposed algorithm makes off-chip memory accesses reduced by 1/50, and accordingly achieves 97.59 % reduction in the amount of feature-map data to be transferred from/to off-chip memory.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Memory and Neural Computing · Advanced Neural Network Applications · Ferroelectric and Negative Capacitance Devices