BiQGEMM: Matrix Multiplication with Lookup Table For Binary-Coding-based Quantized DNNs
Yongkweon Jeon, Baeseong Park, Se Jung Kwon, Byeongwook Kim, Jeongin, Yun, and Dongsoo Lee

TL;DR
BiQGEMM is a novel matrix multiplication method optimized for quantized deep neural networks, utilizing lookup tables and pre-computation to enhance efficiency on standard hardware.
Contribution
The paper introduces BiQGEMM, a new computation engine that efficiently supports quantized DNNs by enabling simultaneous weight access and reducing redundant calculations.
Findings
BiQGEMM outperforms traditional methods in quantized DNN matrix multiplication.
It reduces computational redundancy through lookup tables and pre-computation.
Experimental results demonstrate higher performance on quantized models.
Abstract
The number of parameters in deep neural networks (DNNs) is rapidly increasing to support complicated tasks and to improve model accuracy. Correspondingly, the amount of computations and required memory footprint increase as well. Quantization is an efficient method to address such concerns by compressing DNNs such that computations can be simplified while required storage footprint is significantly reduced. Unfortunately, commercial CPUs and GPUs do not fully support quantization because only fixed data transfers (such as 32 bits) are allowed. As a result, even if weights are quantized into a few bits, CPUs and GPUs cannot access multiple quantized weights without memory bandwidth waste. Success of quantization in practice, hence, relies on an efficient computation engine design, especially for matrix multiplication that is a basic computation engine in most DNNs. In this paper, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Topic Modeling · Advanced Neural Network Applications
