Mitigating Edge Machine Learning Inference Bottlenecks: An Empirical   Study on Accelerating Google Edge Models

Amirali Boroumand; Saugata Ghose; Berkin Akin; Ravi Narayanaswami,; Geraldo F. Oliveira; Xiaoyu Ma; Eric Shiu; Onur Mutlu

arXiv:2103.00768·cs.AR·March 2, 2021·20 cites

Mitigating Edge Machine Learning Inference Bottlenecks: An Empirical Study on Accelerating Google Edge Models

Amirali Boroumand, Saugata Ghose, Berkin Akin, Ravi Narayanaswami,, Geraldo F. Oliveira, Xiaoyu Ma, Eric Shiu, Onur Mutlu

PDF

Open Access

TL;DR

This paper analyzes the limitations of current edge ML accelerators and introduces Mensa, a heterogeneous acceleration framework that significantly improves energy efficiency and throughput for Google edge models.

Contribution

The paper presents Mensa, a novel heterogeneous acceleration framework that dynamically assigns neural network layers to specialized accelerators, addressing the heterogeneity in layer characteristics.

Findings

01

Mensa improves energy efficiency by 3.0x and throughput by 3.1x over the Edge TPU.

02

All layers in Google edge models naturally group into a small number of clusters.

03

Mensa achieves these improvements with only three specialized accelerators.

Abstract

As the need for edge computing grows, many modern consumer devices now contain edge machine learning (ML) accelerators that can compute a wide range of neural network (NN) models while still fitting within tight resource constraints. We analyze a commercial Edge TPU using 24 Google edge NN models (including CNNs, LSTMs, transducers, and RCNNs), and find that the accelerator suffers from three shortcomings, in terms of computational throughput, energy efficiency, and memory access handling. We comprehensively study the characteristics of each NN layer in all of the Google edge models, and find that these shortcomings arise from the one-size-fits-all approach of the accelerator, as there is a high amount of heterogeneity in key layer characteristics both across different models and across different layers in the same model. We propose a new acceleration framework called Mensa. Mensa…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Adversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)