SKIM: Any-bit Quantization Pushing The Limits of Post-Training   Quantization

Runsheng Bai; Bo Liu; Qiang Liu

arXiv:2412.04180·cs.LG·December 10, 2024

SKIM: Any-bit Quantization Pushing The Limits of Post-Training Quantization

Runsheng Bai, Bo Liu, Qiang Liu

PDF

Open Access

TL;DR

SKIM introduces a novel quantization method for large language models that optimally allocates bits and uses trainable scaling, significantly reducing performance loss at low precision levels.

Contribution

The paper presents SKIM, a new quantization approach with a greedy bit allocation algorithm and trainable scaling, enabling effective any-bit quantization of LLMs.

Findings

01

Reduces perplexity gap by 16.3% at 3-bit quantization

02

Improves performance of low-bit LLaMA models

03

Adapts to any given bit level efficiently

Abstract

Large Language Models (LLMs) exhibit impressive performance across various tasks, but deploying them for inference poses challenges. Their high resource demands often necessitate complex, costly multi-GPU pipelines, or the use of smaller, less capable models. While quantization offers a promising solution utilizing lower precision for model storage, existing methods frequently experience significant performance drops at lower precision levels. Additionally, they typically provide only a limited set of solutions at specific bit levels, many of which are extensively manually tuned. To address these challenges, we propose a new method called SKIM: Scaled K-means clustering wIth Mixed precision. Our approach introduces two novel techniques: 1. A greedy algorithm to solve approximately optimal bit allocation across weight channels, and 2. A trainable scaling vector for non-differentiable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPhotonic and Optical Devices

MethodsSparse Evolutionary Training · LLaMA · k-Means Clustering