Compensate Quantization Errors: Make Weights Hierarchical to Compensate   Each Other

Yifei Gao; Jie Ou; Lei Wang; Yuting Xiao; Zhiyuan Xiang; Ruiting Dai,; Jun Cheng

arXiv:2406.16299·cs.CL·June 25, 2024

Compensate Quantization Errors: Make Weights Hierarchical to Compensate Each Other

Yifei Gao, Jie Ou, Lei Wang, Yuting Xiao, Zhiyuan Xiang, Ruiting Dai,, Jun Cheng

PDF

Open Access 1 Video

TL;DR

This paper introduces Learnable Singular value Increment (LSI), a novel method that uses SVD to make weights compensatory in quantized large language models, significantly improving accuracy across various low-bit quantization scenarios.

Contribution

The paper proposes LSI, a new technique combining SVD and learnable singular values to enhance weight compensation in quantized models, achieving state-of-the-art results.

Findings

01

State-of-the-art performance in diverse quantization settings

02

Effective in weight-only, weight-activation, and extremely low-bit scenarios

03

Enables efficient finetuning on quantized models

Abstract

Emergent Large Language Models (LLMs) use their extraordinary performance and powerful deduction capacity to discern from traditional language models. However, the expenses of computational resources and storage for these LLMs are stunning, quantization then arises as a trending conversation. To address accuracy decay caused by quantization, two streams of works in post-training quantization methods stand out. One uses other weights to compensate existing quantization error, while the other transfers the quantization difficulty to other parts in the model. Combining both merits, we introduce Learnable Singular value Increment (LSI) as an advanced solution. LSI uses Singular Value Decomposition to extract singular values of the weights and make them learnable to help weights compensate each other conditioned on activation. Incorporating LSI with existing techniques, we achieve…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Compensate Quantization Errors: Make Weights Hierarchical to Compensate Each Other· underline

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Speech Recognition and Synthesis