MAPLE-Edge: A Runtime Latency Predictor for Edge Devices
Saeejith Nair, Saad Abbasi, Alexander Wong, Mohammad Javad Shafiee

TL;DR
MAPLE-Edge is a latency prediction model tailored for edge devices that uses minimal runtime data to accurately estimate neural network latency, facilitating efficient NAS for optimized edge deployment.
Contribution
It extends MAPLE to edge devices by using a small set of CPU counters and normalization techniques, achieving significant accuracy improvements over prior methods.
Findings
Up to +49.6% accuracy over baseline methods.
Effective generalization across diverse runtimes.
Additional samples improve accuracy by nearly +40%.
Abstract
Neural Architecture Search (NAS) has enabled automatic discovery of more efficient neural network architectures, especially for mobile and embedded vision applications. Although recent research has proposed ways of quickly estimating latency on unseen hardware devices with just a few samples, little focus has been given to the challenges of estimating latency on runtimes using optimized graphs, such as TensorRT and specifically for edge devices. In this work, we propose MAPLE-Edge, an edge device-oriented extension of MAPLE, the state-of-the-art latency predictor for general purpose hardware, where we train a regression network on architecture-latency pairs in conjunction with a hardware-runtime descriptor to effectively estimate latency on a diverse pool of edge devices. Compared to MAPLE, MAPLE-Edge can describe the runtime and target device platform using a much smaller set of CPU…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Machine Learning in Materials Science · Parallel Computing and Optimization Techniques
