Efficient Memory Management for Deep Neural Net Inference

Yury Pisarchyk; Juhyun Lee

arXiv:2001.03288·cs.LG·February 18, 2020·5 cites

Efficient Memory Management for Deep Neural Net Inference

Yury Pisarchyk, Juhyun Lee

PDF

Open Access 2 Repos

TL;DR

This paper presents strategies for efficient memory sharing in deep neural network inference on edge devices, reducing memory footprint and enabling better performance on resource-constrained hardware.

Contribution

It introduces novel memory management techniques that optimize buffer sharing among tensors, achieving up to 11% smaller memory usage compared to existing methods.

Findings

01

Up to 11% reduction in memory footprint

02

Improved memory sharing strategies for neural nets

03

Enhanced inference efficiency on edge devices

Abstract

While deep neural net inference was considered a task for servers only, latest advances in technology allow the task of inference to be moved to mobile and embedded devices, desired for various reasons ranging from latency to privacy. These devices are not only limited by their compute power and battery, but also by their inferior physical memory and cache, and thus, an efficient memory manager becomes a crucial component for deep neural net inference at the edge. We explore various strategies to smartly share memory buffers among intermediate tensors in deep neural nets. Employing these can result in up to 11% smaller memory footprint than the state of the art.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Parallel Computing and Optimization Techniques · Stochastic Gradient Optimization Techniques