Benchmarking Edge AI Platforms for High-Performance ML Inference
Rakshith Jayanth, Neelesh Gupta, Viktor Prasanna

TL;DR
This paper benchmarks various edge AI hardware platforms, revealing the strengths and trade-offs of CPU, GPU, and NPU solutions for neural network inference in terms of latency, throughput, and power efficiency.
Contribution
It provides a comprehensive comparison of edge AI hardware, highlighting the performance advantages of NPUs and GPUs for specific neural network tasks, guiding deployment choices.
Findings
NPU outperforms in matrix-vector multiplication and neural network tasks
GPU excels in matrix multiplication and LSTM networks
NPU offers a good balance of latency, throughput, and power consumption
Abstract
Edge computing's growing prominence, due to its ability to reduce communication latency and enable real-time processing, is promoting the rise of high-performance, heterogeneous System-on-Chip solutions. While current approaches often involve scaling down modern hardware, the performance characteristics of neural network workloads on these platforms can vary significantly, especially when it comes to parallel processing, which is a critical consideration for edge deployments. To address this, we conduct a comprehensive study comparing the latency and throughput of various linear algebra and neural network inference tasks across CPU-only, CPU/GPU, and CPU/NPU integrated solutions. {We find that the Neural Processing Unit (NPU) excels in matrix-vector multiplication (58.6% faster) and some neural network tasks (3.2 faster for video classification and large language models). GPU…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Explainable Artificial Intelligence (XAI)
MethodsTanh Activation · Sigmoid Activation · Long Short-Term Memory
