On-Device Neural Net Inference with Mobile GPUs
Juhyun Lee, Nikolay Chirkov, Ekaterina Ignasheva, Yury Pisarchyk,, Mogan Shieh, Fabio Riccardi, Raman Sarokin, Andrei Kulik, and Matthias, Grundmann

TL;DR
This paper demonstrates how to leverage mobile GPUs for real-time neural network inference on smartphones, providing a GPU-optimized inference engine integrated into TensorFlow Lite to improve performance and efficiency.
Contribution
The paper introduces a GPU-based inference engine for mobile devices, enabling real-time neural network processing on virtually all smartphones, and discusses design principles for GPU-friendly neural networks.
Findings
Achieves real-time neural network inference on mobile GPUs
Integrates GPU inference engine into TensorFlow Lite
Provides design guidelines for GPU-efficient neural networks
Abstract
On-device inference of machine learning models for mobile phones is desirable due to its lower latency and increased privacy. Running such a compute-intensive task solely on the mobile CPU, however, can be difficult due to limited computing power, thermal constraints, and energy consumption. App developers and researchers have begun exploiting hardware accelerators to overcome these challenges. Recently, device manufacturers are adding neural processing units into high-end phones for on-device inference, but these account for only a small fraction of hand-held devices. In this paper, we present how we leverage the mobile GPU, a ubiquitous hardware accelerator on virtually every phone, to run inference of deep neural networks in real-time for both Android and iOS devices. By describing our architecture, we also discuss how to design networks that are mobile GPU-friendly. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · CCD and CMOS Imaging Sensors · Advanced Memory and Neural Computing
