HG-Caffe: Mobile and Embedded Neural Network GPU (OpenCL) Inference Engine with FP16 Supporting
Zhuoran Ji

TL;DR
HG-Caffe is a GPU-based neural network inference engine supporting FP16 that significantly accelerates deep learning tasks on mobile devices while reducing memory usage, enabling more advanced mobile AI applications.
Contribution
The paper introduces HG-Caffe, a novel GPU inference engine with FP16 support that achieves substantial speedup and memory reduction for mobile deep learning applications.
Findings
Up to 20x speedup over original implementations
Memory usage reduced by approximately 80%
Enables more advanced mobile AI applications
Abstract
Breakthroughs in the fields of deep learning and mobile system-on-chips are radically changing the way we use our smartphones. However, deep neural networks inference is still a challenging task for edge AI devices due to the computational overhead on mobile CPUs and a severe drain on the batteries. In this paper, we present a deep neural network inference engine named HG-Caffe, which supports GPUs with half precision. HG-Caffe provides up to 20 times speedup with GPUs compared to the original implementations. In addition to the speedup, the peak memory usage is also reduced to about 80%. With HG-Caffe, more innovative and fascinating mobile applications will be turned into reality.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Parallel Computing and Optimization Techniques · Advanced Memory and Neural Computing
