Diagonal Memory Optimisation for Machine Learning on Micro-controllers
Peter Blacker, Christopher Paul Bridges, Simon Hadfield

TL;DR
This paper introduces a diagonal memory optimisation technique for machine learning inference on micro-controllers, significantly reducing memory usage and enabling deployment of models on limited hardware.
Contribution
It presents three methods to safely overlap input and output buffers in tensor operations, optimizing memory use for ML inference on micro-controllers.
Findings
Memory savings of up to 34.5% achieved
Enables deployment of models on constrained hardware
Identifies models that require optimisation for deployment
Abstract
As machine learning spreads into more and more application areas, micro controllers and low power CPUs are increasingly being used to perform inference with machine learning models. The capability to deploy onto these limited hardware targets is enabling machine learning models to be used across a diverse range of new domains. Optimising the inference process on these targets poses different challenges from either desktop CPU or GPU implementations, where the small amounts of RAM available on these targets sets limits on size of models which can be executed. Analysis of the memory use patterns of eleven machine learning models was performed. Memory load and store patterns were observed using a modified version of the Valgrind debugging tool, identifying memory areas holding values necessary for the calculation as inference progressed. These analyses identified opportunities optimise the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Neural Network Applications · Computational Physics and Python Applications
