A method of using RSVD in residual calculation of LowBit GEMM
Hongyaoxing Gu

TL;DR
This paper introduces LRQMM, a low-rank residual quantization method for low-precision matrix multiplication, significantly improving accuracy with minimal speed reduction, applicable to deep learning models.
Contribution
The paper presents LRQMM, a novel low-rank residual quantization technique that enhances low-precision GEMM accuracy without requiring additional data or extensive pre-training.
Findings
Reduces quantization error by 1-2 orders of magnitude
Achieves 61.8% ImageNet Top-1 accuracy with LRQMM-4bit in Resnet-50
Only 20% speed reduction compared to standard GEMM
Abstract
The advancements of hardware technology in recent years has brought many possibilities for low-precision applications. However, the use of low precision can introduce significant computational errors, posing a considerable challenge to maintaining the computational accuracy. We propose low-rank residuals quantized matrix multiplication(LRQMM) method which introduces low-rank approximation in residual compensation for dense low precision quantization matrix multiplication. It can bring several times accuracy improvement with only BLAS-2 level extra time overhead. Moreover, LRQMM is a completely data-free quantization method that does not require additional data for pre-training. And it only works with low precision GEMM operator, which is easy to couple with other methods. Through experimentation, LRQMM can reduce the error of direct quantized matrix multiplication by 1~2 orders of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Measurement and Metrology Techniques · Optical Systems and Laser Technology · Welding Techniques and Residual Stresses
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
