A method of using RSVD in residual calculation of LowBit GEMM

Hongyaoxing Gu

arXiv:2409.18772·cs.MS·September 30, 2024

A method of using RSVD in residual calculation of LowBit GEMM

Hongyaoxing Gu

PDF

Open Access

TL;DR

This paper introduces LRQMM, a low-rank residual quantization method for low-precision matrix multiplication, significantly improving accuracy with minimal speed reduction, applicable to deep learning models.

Contribution

The paper presents LRQMM, a novel low-rank residual quantization technique that enhances low-precision GEMM accuracy without requiring additional data or extensive pre-training.

Findings

01

Reduces quantization error by 1-2 orders of magnitude

02

Achieves 61.8% ImageNet Top-1 accuracy with LRQMM-4bit in Resnet-50

03

Only 20% speed reduction compared to standard GEMM

Abstract

The advancements of hardware technology in recent years has brought many possibilities for low-precision applications. However, the use of low precision can introduce significant computational errors, posing a considerable challenge to maintaining the computational accuracy. We propose low-rank residuals quantized matrix multiplication(LRQMM) method which introduces low-rank approximation in residual compensation for dense low precision quantization matrix multiplication. It can bring several times accuracy improvement with only BLAS-2 level extra time overhead. Moreover, LRQMM is a completely data-free quantization method that does not require additional data for pre-training. And it only works with low precision GEMM operator, which is easy to couple with other methods. Through experimentation, LRQMM can reduce the error of direct quantized matrix multiplication by 1~2 orders of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Measurement and Metrology Techniques · Optical Systems and Laser Technology · Welding Techniques and Residual Stresses

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings