Binarized Convolutional Neural Networks for Efficient Inference on GPUs
Mir Khan, Heikki Huttunen, Jani Boutellier

TL;DR
This paper presents a method for binarizing convolutional neural networks to enable efficient real-time inference on GPU platforms, significantly reducing computation and memory requirements with minimal accuracy loss.
Contribution
It introduces a GPU-based implementation of binarized CNNs that achieves high speedup and low accuracy loss on resource-constrained devices.
Findings
Achieves up to 7.4x speedup over floating point networks.
Reduces computational load and memory footprint.
Maintains 95.6% accuracy of full-precision models.
Abstract
Convolutional neural networks have recently achieved significant breakthroughs in various image classification tasks. However, they are computationally expensive,which can make their feasible mplementation on embedded and low-power devices difficult. In this paper convolutional neural network binarization is implemented on GPU-based platforms for real-time inference on resource constrained devices. In binarized networks, all weights and intermediate computations between layers are quantized to +1 and -1, allowing multiplications and additions to be replaced with bit-wise operations between 32-bit words. This representation completely eliminates the need for floating point multiplications and additions and decreases both the computational load and the memory footprint compared to a full-precision network implemented in floating point, making it well-suited for resource-constrained…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Medical Image Segmentation Techniques · Advanced Image and Video Retrieval Techniques
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
