HG-Caffe: Mobile and Embedded Neural Network GPU (OpenCL) Inference   Engine with FP16 Supporting

Zhuoran Ji

arXiv:1901.00858·cs.LG·January 7, 2019·5 cites

HG-Caffe: Mobile and Embedded Neural Network GPU (OpenCL) Inference Engine with FP16 Supporting

Zhuoran Ji

PDF

Open Access

TL;DR

HG-Caffe is a GPU-based neural network inference engine supporting FP16 that significantly accelerates deep learning tasks on mobile devices while reducing memory usage, enabling more advanced mobile AI applications.

Contribution

The paper introduces HG-Caffe, a novel GPU inference engine with FP16 support that achieves substantial speedup and memory reduction for mobile deep learning applications.

Findings

01

Up to 20x speedup over original implementations

02

Memory usage reduced by approximately 80%

03

Enables more advanced mobile AI applications

Abstract

Breakthroughs in the fields of deep learning and mobile system-on-chips are radically changing the way we use our smartphones. However, deep neural networks inference is still a challenging task for edge AI devices due to the computational overhead on mobile CPUs and a severe drain on the batteries. In this paper, we present a deep neural network inference engine named HG-Caffe, which supports GPUs with half precision. HG-Caffe provides up to 20 times speedup with GPUs compared to the original implementations. In addition to the speedup, the peak memory usage is also reduced to about 80%. With HG-Caffe, more innovative and fascinating mobile applications will be turned into reality.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Parallel Computing and Optimization Techniques · Advanced Memory and Neural Computing