MAFAT: Memory-Aware Fusing and Tiling of Neural Networks for Accelerated Edge Inference
Jackson Farley, Andreas Gerstlauer

TL;DR
This paper introduces a memory-aware fusing and tiling method for neural networks that reduces memory usage and accelerates edge inference by optimizing layer grouping and execution configurations.
Contribution
It extends prior distributed partitioning techniques to single-device execution, enabling memory reduction and speedup through optimized fusing and tiling strategies.
Findings
Reduces memory footprint by more than 50% on YOLOv2.
Achieves up to 2.78x speedup under memory constraints.
Latency within 6% of optimal manual configurations.
Abstract
A rising research challenge is running costly machine learning (ML) networks locally on resource-constrained edge devices. ML networks with large convolutional layers can easily exceed available memory, increasing latency due to excessive OS swapping. Previous memory reduction techniques such as pruning and quantization reduce model accuracy and often require retraining. Alternatively, distributed methods partition the convolutions into equivalent smaller sub-computations, but the implementations introduce communication costs and require a network of devices. Distributed partitioning approaches can, however, also be used to run in a reduced memory footprint on a single device by subdividing the network into smaller operations. In this paper, we extend prior work on distributed partitioning into a memory-aware execution on a single device. Our approach extends prior fusing strategies to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
MethodsPruning · Softmax · 1x1 Convolution · Max Pooling · Convolution · Batch Normalization · Average Pooling · Global Average Pooling · Darknet-19 · YOLOv2
