NGEMM: Optimizing GEMM for Deep Learning via Compiler-based Techniques
Wenlei Bao, Li-Wen Chang, Yang Chen, Ke Deng, Amit Agarwal, Emad, Barsoum, Abe Taha

TL;DR
NGEMM is a compiler-based optimized GEMM implementation that significantly accelerates low-precision matrix multiplication for deep learning, outperforming existing libraries and enhancing production service efficiency.
Contribution
This paper introduces NGEMM, a novel compiler-based approach that improves integer GEMM performance for quantized DNNs by optimizing vector unit utilization and reducing unnecessary computations.
Findings
NGEMM outperforms MKL non-pack by 1.86x on average.
NGEMM outperforms MKL pack by 1.16x on average.
Successfully deployed in Microsoft production services.
Abstract
Quantization has emerged to be an effective way to significantly boost the performance of deep neural networks (DNNs) by utilizing low-bit computations. Despite having lower numerical precision, quantized DNNs are able to reduce both memory bandwidth and computation cycles with little losses of accuracy. Integer GEMM (General Matrix Multiplication) is critical to running quantized DNN models efficiently, as GEMM operations often dominate the computations in these models. Various approaches have been developed by leveraging techniques such as vectorization and memory layout to improve the performance of integer GEMM. However, these existing approaches are not fast enough in certain scenarios. We developed NGEMM, a compiler-based GEMM implementation for accelerating lower-precision training and inference. NGEMM has better use of the vector units by avoiding unnecessary vector computation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Parallel Computing and Optimization Techniques · Machine Learning and Algorithms
