Optimizing data-flow in Binary Neural Networks

L. Vorabbi; D. Maltoni; S. Santi

arXiv:2304.00952·cs.LG·April 4, 2023·1 cites

Optimizing data-flow in Binary Neural Networks

L. Vorabbi, D. Maltoni, S. Santi

PDF

Open Access

TL;DR

This paper introduces a comprehensive set of optimizations for Binary Neural Networks, including data flow enhancements and implementation improvements, leading to significantly faster inference without accuracy loss.

Contribution

It presents a novel training scheme and implementation techniques that optimize data flow and reduce latency in BNNs, improving inference speed.

Findings

01

Inference speed improved up to 2.73x

02

No accuracy loss for at least one full-precision model

03

Optimizations applicable to ARM instruction sets

Abstract

Binary Neural Networks (BNNs) can significantly accelerate the inference time of a neural network by replacing its expensive floating-point arithmetic with bitwise operations. Most existing solutions, however, do not fully optimize data flow through the BNN layers, and intermediate conversions from 1 to 16/32 bits often further hinder efficiency. We propose a novel training scheme that can increase data flow and parallelism in the BNN pipeline; specifically, we introduce a clipping block that decreases the data-width from 32 bits to 8. Furthermore, we reduce the internal accumulator size of a binary layer, usually kept using 32-bit to prevent data overflow without losing accuracy. Additionally, we provide an optimization of the Batch Normalization layer that both reduces latency and simplifies deployment. Finally, we present an optimized implementation of the Binary Direct Convolution…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Neural Networks and Applications · Adversarial Robustness in Machine Learning

MethodsConvolution · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Batch Normalization