Enable Deep Learning on Mobile Devices: Methods, Systems, and   Applications

Han Cai; Ji Lin; Yujun Lin; Zhijian Liu; Haotian Tang; Hanrui Wang,; Ligeng Zhu; Song Han

arXiv:2204.11786·cs.LG·April 26, 2022

Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications

Han Cai, Ji Lin, Yujun Lin, Zhijian Liu, Haotian Tang, Hanrui Wang,, Ligeng Zhu, Song Han

PDF

TL;DR

This paper reviews methods, systems, and applications that enable deep learning models to run efficiently on resource-constrained mobile devices, focusing on model compression, AutoML, on-device training, and specialized accelerations.

Contribution

It provides a comprehensive overview of recent advances in efficient deep learning techniques, including automated model design and system optimization for mobile AI deployment.

Findings

01

Model compression techniques like pruning, quantization, and factorization improve efficiency.

02

AutoML frameworks automate the design of compact and efficient models.

03

Task-specific accelerations enhance performance for point cloud, video, and NLP tasks.

Abstract

Deep neural networks (DNNs) have achieved unprecedented success in the field of artificial intelligence (AI), including computer vision, natural language processing and speech recognition. However, their superior performance comes at the considerable cost of computational complexity, which greatly hinders their applications in many resource-constrained devices, such as mobile phones and Internet of Things (IoT) devices. Therefore, methods and techniques that are able to lift the efficiency bottleneck while preserving the high accuracy of DNNs are in great demand in order to enable numerous edge AI applications. This paper provides an overview of efficient deep learning methods, systems and applications. We start from introducing popular model compression methods, including pruning, factorization, quantization as well as compact model design. To reduce the large design cost of these…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsPruning