Boosting DNN Cold Inference on Edge Devices
Rongjie Yi, Ting Cao, Ao Zhou, Xiao Ma, Shangguang Wang, Mengwei Xu

TL;DR
This paper introduces NNV12, an on-device inference engine optimized for cold DNN inference on edge devices, employing novel techniques to significantly improve startup performance.
Contribution
NNV12 is the first inference engine specifically designed to optimize cold DNN inference on edge devices, utilizing three novel optimization techniques and a heuristic scheduling scheme.
Findings
NNV12 achieves up to 15.2x speedup on edge CPUs.
NNV12 achieves up to 401.5x speedup on edge GPUs.
The prototype demonstrates substantial performance improvements over existing engines.
Abstract
DNNs are ubiquitous on edge devices nowadays. With its increasing importance and use cases, it's not likely to pack all DNNs into device memory and expect that each inference has been warmed up. Therefore, cold inference, the process to read, initialize, and execute a DNN model, is becoming commonplace and its performance is urgently demanded to be optimized. To this end, we present NNV12, the first on-device inference engine that optimizes for cold inference NNV12 is built atop 3 novel optimization knobs: selecting a proper kernel (implementation) for each DNN operator, bypassing the weights transformation process by caching the post-transformed weights on disk, and pipelined execution of many kernels on asymmetric processors. To tackle with the huge search space, NNV12 employs a heuristic-based scheme to obtain a near-optimal kernel scheduling plan. We fully implement a prototype of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Machine Learning and ELM · Advanced Memory and Neural Computing
