High performance ultra-low-precision convolutions on mobile devices

Andrew Tulloch; Yangqing Jia

arXiv:1712.02427·cs.LG·December 8, 2017·19 cites

High performance ultra-low-precision convolutions on mobile devices

Andrew Tulloch, Yangqing Jia

PDF

Open Access

TL;DR

This paper presents an open-source implementation of ultra-low-precision convolutions (<4 bits) optimized for mobile devices, achieving significant speedups over traditional methods, especially on older ARMv7 hardware.

Contribution

It introduces a novel ultra-low-precision convolution implementation for ARMv7 devices, with comprehensive analysis and open-source code, enabling faster mobile deep learning inference.

Findings

01

Achieved 4x-20x speedup over float32 and int8 baselines.

02

Demonstrated effectiveness on older ARMv7 mobile devices.

03

Provided open-source implementation for broader adoption.

Abstract

Many applications of mobile deep learning, especially real-time computer vision workloads, are constrained by computation power. This is particularly true for workloads running on older consumer phones, where a typical device might be powered by a single- or dual-core ARMv7 CPU. We provide an open-source implementation and a comprehensive analysis of (to our knowledge) the state of the art ultra-low-precision (<4 bit precision) implementation of the core primitives required for modern deep learning workloads on ARMv7 devices, and demonstrate speedups of 4x-20x over our additional state-of-the-art float32 and int8 baselines.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvancements in PLL and VCO Technologies · Parallel Computing and Optimization Techniques · Interconnection Networks and Systems