Fast matrix multiplication for binary and ternary CNNs on ARM CPU
Anton Trusov, Elena Limonova, Dmitry Nikolaev, Vladimir V. Arlazarov

TL;DR
This paper introduces fast, efficient algorithms for binary and ternary neural network matrix multiplication optimized for ARM CPUs, enabling practical mobile device deployment of low-bit quantized neural networks.
Contribution
The paper presents novel algorithms that leverage ARM NEON SIMD for fast binary and ternary matrix multiplication, improving inference speed for low-bit neural networks on mobile devices.
Findings
Achieved significant speedup over existing implementations.
Enabled efficient inference of TNNs, TBNs, and BNNs on ARM CPUs.
Reduced memory and computational load for low-bit neural networks.
Abstract
Low-bit quantized neural networks are of great interest in practical applications because they significantly reduce the consumption of both memory and computational resources. Binary neural networks are memory and computationally efficient as they require only one bit per weight and activation and can be computed using Boolean logic and bit count operations. QNNs with ternary weights and activations and binary weights and ternary activations aim to improve recognition quality compared to BNNs while preserving low bit-width. However, their efficient implementation is usually considered on ASICs and FPGAs, limiting their applicability in real-life tasks. At the same time, one of the areas where efficient recognition is most in demand is recognition on mobile devices using their CPUs. However, there are no known fast implementations of TBNs and TNN, only the daBNN library for BNNs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Ferroelectric and Negative Capacitance Devices · Brain Tumor Detection and Classification
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
