HOBFLOPS CNNs: Hardware Optimized Bitslice-Parallel Floating-Point Operations for Convolutional Neural Networks
James Garland, David Gregg

TL;DR
HOBFLOPS introduces hardware-optimized bitslice-parallel floating-point operations enabling efficient, customizable low-precision CNN inference on general-purpose processors, significantly outperforming existing softFP16 implementations.
Contribution
The paper presents a novel method to generate hardware-optimized, custom-precision bitslice-parallel floating-point routines for CNNs, bridging the gap between FPGA/ASIC accelerators and general-purpose processors.
Findings
HOBFLOPS16 outperforms SoftFP16 by 8x on Intel AVX512.
HOBFLOPS9 achieves 6x the performance of HOBFLOPS16 on Arm Neon.
HOBFLOPS enables flexible, high-performance low-precision CNN inference on standard CPUs.
Abstract
Convolutional neural networks (CNNs) are typically trained using 16- or 32-bit floating-point (FP) and researchers show that low-precision floating-point (FP) can be highly effective for inference. Low-precision FP can be implemented in field programmable gate array (FPGA) and application-specific integrated circuit (ASIC) accelerators, but existing processors do not generally support custom precision FP. We propose hardware optimized bitslice-parallel floating-point operators (HOBFLOPS), a method of generating efficient custom-precision emulated bitslice-parallel software FP arithmetic. We generate custom-precision FP routines optimized using a hardware synthesis design flow to create circuits. We provide standard cell libraries matching the bitwise operations on the target microprocessor architecture, and a code-generator to translate the hardware circuits to bitslice software…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNumerical Methods and Algorithms · Digital Filter Design and Implementation · Model Reduction and Neural Networks
MethodsConvolution
