MAFAT: Memory-Aware Fusing and Tiling of Neural Networks for Accelerated   Edge Inference

Jackson Farley; Andreas Gerstlauer

arXiv:2107.06960·cs.LG·July 20, 2023

MAFAT: Memory-Aware Fusing and Tiling of Neural Networks for Accelerated Edge Inference

Jackson Farley, Andreas Gerstlauer

PDF

Open Access

TL;DR

This paper introduces a memory-aware fusing and tiling method for neural networks that reduces memory usage and accelerates edge inference by optimizing layer grouping and execution configurations.

Contribution

It extends prior distributed partitioning techniques to single-device execution, enabling memory reduction and speedup through optimized fusing and tiling strategies.

Findings

01

Reduces memory footprint by more than 50% on YOLOv2.

02

Achieves up to 2.78x speedup under memory constraints.

03

Latency within 6% of optimal manual configurations.

Abstract

A rising research challenge is running costly machine learning (ML) networks locally on resource-constrained edge devices. ML networks with large convolutional layers can easily exceed available memory, increasing latency due to excessive OS swapping. Previous memory reduction techniques such as pruning and quantization reduce model accuracy and often require retraining. Alternatively, distributed methods partition the convolutions into equivalent smaller sub-computations, but the implementations introduce communication costs and require a network of devices. Distributed partitioning approaches can, however, also be used to run in a reduced memory footprint on a single device by subdividing the network into smaller operations. In this paper, we extend prior work on distributed partitioning into a memory-aware execution on a single device. Our approach extends prior fusing strategies to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications

MethodsPruning · Softmax · 1x1 Convolution · Max Pooling · Convolution · Batch Normalization · Average Pooling · Global Average Pooling · Darknet-19 · YOLOv2