DAQ: Density-Aware Post-Training Weight-Only Quantization For LLMs

Yingsong Luo; Ling Chen

arXiv:2410.12187·cs.LG·October 18, 2024

DAQ: Density-Aware Post-Training Weight-Only Quantization For LLMs

Yingsong Luo, Ling Chen

PDF

Open Access 1 Repo

TL;DR

This paper introduces DAQ, a novel density-aware post-training weight-only quantization method for large language models that improves accuracy by aligning high-density weight regions and optimizing quantization parameters.

Contribution

DAQ is the first to incorporate density-aware alignment and learnable dynamic range adjustment for weight-only quantization in LLMs, enhancing performance over existing methods.

Findings

01

Reduces perplexity loss by 22.8% on LLaMA

02

Reduces perplexity loss by 19.6% on LLaMA-2

03

Outperforms baseline quantization methods

Abstract

Large language models (LLMs) excel in various tasks but face deployment challenges due to hardware constraints. We propose density-aware post-training weight-only quantization (DAQ), which has two stages: 1) density-centric alignment, which identifies the center of high-density weights and centers the dynamic range on this point to align high-density weight regions with floating-point high-precision regions; 2) learnable dynamic range adjustment, which adjusts the dynamic range by optimizing quantization parameters (i.e., scale and zero-point) based on the impact of weights on the model output. Experiments on LLaMA and LLaMA-2 show that DAQ consistently outperforms the best baseline method, reducing perplexity loss by an average of 22.8% on LLaMA and 19.6% on LLaMA-2. Our code is available at https://github.com/LuoYingSong/DAQ.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

luoyingsong/daq
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMedical Image Segmentation Techniques · Advanced Image and Video Retrieval Techniques · Medical Imaging Techniques and Applications

MethodsLLaMA · ALIGN