HOBFLOPS CNNs: Hardware Optimized Bitslice-Parallel Floating-Point   Operations for Convolutional Neural Networks

James Garland; David Gregg

arXiv:2007.06563·cs.AR·March 2, 2021

HOBFLOPS CNNs: Hardware Optimized Bitslice-Parallel Floating-Point Operations for Convolutional Neural Networks

James Garland, David Gregg

PDF

Open Access

TL;DR

HOBFLOPS introduces hardware-optimized bitslice-parallel floating-point operations enabling efficient, customizable low-precision CNN inference on general-purpose processors, significantly outperforming existing softFP16 implementations.

Contribution

The paper presents a novel method to generate hardware-optimized, custom-precision bitslice-parallel floating-point routines for CNNs, bridging the gap between FPGA/ASIC accelerators and general-purpose processors.

Findings

01

HOBFLOPS16 outperforms SoftFP16 by 8x on Intel AVX512.

02

HOBFLOPS9 achieves 6x the performance of HOBFLOPS16 on Arm Neon.

03

HOBFLOPS enables flexible, high-performance low-precision CNN inference on standard CPUs.

Abstract

Convolutional neural networks (CNNs) are typically trained using 16- or 32-bit floating-point (FP) and researchers show that low-precision floating-point (FP) can be highly effective for inference. Low-precision FP can be implemented in field programmable gate array (FPGA) and application-specific integrated circuit (ASIC) accelerators, but existing processors do not generally support custom precision FP. We propose hardware optimized bitslice-parallel floating-point operators (HOBFLOPS), a method of generating efficient custom-precision emulated bitslice-parallel software FP arithmetic. We generate custom-precision FP routines optimized using a hardware synthesis design flow to create circuits. We provide standard cell libraries matching the bitwise operations on the target microprocessor architecture, and a code-generator to translate the hardware circuits to bitslice software…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNumerical Methods and Algorithms · Digital Filter Design and Implementation · Model Reduction and Neural Networks

MethodsConvolution