Sliced-Wasserstein Distribution Alignment Loss Improves the Ultra-Low-Bit Quantization of Large Language Models
Deyu Cao, Yixin Yin, and Samin Aref

TL;DR
This paper introduces a sliced Wasserstein loss function for ultra-low-bit post-training quantization of large language models, improving accuracy and efficiency without extra inference cost.
Contribution
It proposes a novel distribution-aware calibration loss that aligns full-precision and quantized model outputs, enhancing existing quantization methods like OmniQuant and TesseraQ.
Findings
Improves perplexity and task accuracy across multiple ultra-low-bit settings.
Recovers 4.12-20.37% of accuracy loss in LLaMA-2-7B.
Demonstrates effectiveness across different models and quantization frameworks.
Abstract
The benefits of most large language models come with steep and often hidden economic and environmental costs due to their resource usage inefficiency during deployment. Model quantization improves energy and memory efficiency through representing model parameters by lower-precision values. However, compression below 4-bits often distorts activation distributions and degrades performance. We address this challenge by introducing a sliced Wasserstein loss function for distribution-aware calibration in ultra-low-bit post-training quantization. The proposed loss aligns the output distributions of full-precision and quantized models under random linear projections, complementing standard mean-squared error loss without adding any computational overhead during inference. Our proposed loss function can be incorporated with any post-training quantization framework that has a retraining…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Natural Language Processing Techniques · Big Data and Digital Economy
