Hardware optimization on Android for inference of AI models
Iulius Gherasim, Carlos Garc\'ia S\'anchez

TL;DR
This paper investigates optimal hardware configurations on Android devices for AI inference, focusing on object detection and image classification, to enhance speed while maintaining accuracy.
Contribution
It proposes empirical methods to identify the best combination of quantization and hardware acceleration for AI models on Android.
Findings
Optimal configurations vary by model and task.
Quantization and hardware acceleration significantly improve inference speed.
Trade-offs between accuracy and speed are quantified.
Abstract
The pervasive integration of Artificial Intelligence models into contemporary mobile computing is notable across numerous use cases, from virtual assistants to advanced image processing. Optimizing the mobile user experience involves minimal latency and high responsiveness from deployed AI models with challenges from execution strategies that fully leverage real time constraints to the exploitation of heterogeneous hardware architecture. In this paper, we research and propose the optimal execution configurations for AI models on an Android system, focusing on two critical tasks: object detection (YOLO family) and image classification (ResNet). These configurations evaluate various model quantization schemes and the utilization of on device accelerators, specifically the GPU and NPU. Our core objective is to empirically determine the combination that achieves the best trade-off between…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Big Data and Digital Economy
