Fast Implementation of 4-bit Convolutional Neural Networks for Mobile   Devices

Anton Trusov; Elena Limonova; Dmitry Slugin; Dmitry Nikolaev; Vladimir; V. Arlazarov

arXiv:2009.06488·cs.CV·October 21, 2020

Fast Implementation of 4-bit Convolutional Neural Networks for Mobile Devices

Anton Trusov, Elena Limonova, Dmitry Slugin, Dmitry Nikolaev, Vladimir, V. Arlazarov

PDF

TL;DR

This paper presents an efficient implementation of 4-bit matrix multiplication for quantized neural networks on mobile processors, achieving significant speedups and maintaining high accuracy for OCR recognition.

Contribution

It introduces a novel 4-bit matrix multiplication method optimized for mobile ARM processors and demonstrates its effectiveness in OCR tasks.

Findings

01

4-bit implementation achieves 2.9x speedup over floating-point

02

4-bit networks maintain 95.0% accuracy with 48% speedup

03

Compared to 8-bit, 4-bit offers similar accuracy with greater speed

Abstract

Quantized low-precision neural networks are very popular because they require less computational resources for inference and can provide high performance, which is vital for real-time and embedded recognition systems. However, their advantages are apparent for FPGA and ASIC devices, while general-purpose processor architectures are not always able to perform low-bit integer computations efficiently. The most frequently used low-precision neural network model for mobile central processors is an 8-bit quantized network. However, in a number of cases, it is possible to use fewer bits for weights and activations, and the only problem is the difficulty of efficient implementation. We introduce an efficient implementation of 4-bit matrix multiplication for quantized neural networks and perform time measurements on a mobile ARM processor. It shows 2.9 times speedup compared to standard…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.