Diagonal Memory Optimisation for Machine Learning on Micro-controllers

Peter Blacker; Christopher Paul Bridges; Simon Hadfield

arXiv:2010.01668·cs.DC·November 18, 2020

Diagonal Memory Optimisation for Machine Learning on Micro-controllers

Peter Blacker, Christopher Paul Bridges, Simon Hadfield

PDF

Open Access

TL;DR

This paper introduces a diagonal memory optimisation technique for machine learning inference on micro-controllers, significantly reducing memory usage and enabling deployment of models on limited hardware.

Contribution

It presents three methods to safely overlap input and output buffers in tensor operations, optimizing memory use for ML inference on micro-controllers.

Findings

01

Memory savings of up to 34.5% achieved

02

Enables deployment of models on constrained hardware

03

Identifies models that require optimisation for deployment

Abstract

As machine learning spreads into more and more application areas, micro controllers and low power CPUs are increasingly being used to perform inference with machine learning models. The capability to deploy onto these limited hardware targets is enabling machine learning models to be used across a diverse range of new domains. Optimising the inference process on these targets poses different challenges from either desktop CPU or GPU implementations, where the small amounts of RAM available on these targets sets limits on size of models which can be executed. Analysis of the memory use patterns of eleven machine learning models was performed. Memory load and store patterns were observed using a modified version of the Valgrind debugging tool, identifying memory areas holding values necessary for the calculation as inference progressed. These analyses identified opportunities optimise the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Advanced Neural Network Applications · Computational Physics and Python Applications