Optimizing CNN Model Inference on CPUs
Yizhi Liu, Yao Wang, Ruofei Yu, Mu Li, Vin Sharma, Yida Wang

TL;DR
This paper introduces NeoCPU, a comprehensive CPU-based CNN inference optimization framework that outperforms existing methods by enabling joint operation- and graph-level optimizations without third-party libraries.
Contribution
NeoCPU provides a full-stack, systematic approach to optimize CNN inference on CPUs, allowing end-to-end improvements beyond local operation-level tuning.
Findings
Achieves up to 3.45× lower latency compared to state-of-the-art methods.
Employs operation- and graph-level joint optimization techniques.
Does not rely on third-party libraries for operation implementation.
Abstract
The popularity of Convolutional Neural Network (CNN) models and the ubiquity of CPUs imply that better performance of CNN model inference on CPUs can deliver significant gain to a large number of users. To improve the performance of CNN inference on CPUs, current approaches like MXNet and Intel OpenVINO usually treat the model as a graph and use the high-performance libraries such as Intel MKL-DNN to implement the operations of the graph. While achieving reasonable performance on individual operations from the off-the-shelf libraries, this solution makes it inflexible to conduct optimizations at the graph level, as the local operation-level optimizations are predefined. Therefore, it is restrictive and misses the opportunity to optimize the end-to-end inference pipeline as a whole. This paper presents \emph{NeoCPU}, a comprehensive approach of CNN model inference on CPUs that employs a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Brain Tumor Detection and Classification · Advanced Memory and Neural Computing
