Sliced-Wasserstein Distribution Alignment Loss Improves the Ultra-Low-Bit Quantization of Large Language Models

Deyu Cao; Yixin Yin; and Samin Aref

arXiv:2601.07878·cs.LG·January 14, 2026

Sliced-Wasserstein Distribution Alignment Loss Improves the Ultra-Low-Bit Quantization of Large Language Models

Deyu Cao, Yixin Yin, and Samin Aref

PDF

Open Access

TL;DR

This paper introduces a sliced Wasserstein loss function for ultra-low-bit post-training quantization of large language models, improving accuracy and efficiency without extra inference cost.

Contribution

It proposes a novel distribution-aware calibration loss that aligns full-precision and quantized model outputs, enhancing existing quantization methods like OmniQuant and TesseraQ.

Findings

01

Improves perplexity and task accuracy across multiple ultra-low-bit settings.

02

Recovers 4.12-20.37% of accuracy loss in LLaMA-2-7B.

03

Demonstrates effectiveness across different models and quantization frameworks.

Abstract

The benefits of most large language models come with steep and often hidden economic and environmental costs due to their resource usage inefficiency during deployment. Model quantization improves energy and memory efficiency through representing model parameters by lower-precision values. However, compression below 4-bits often distorts activation distributions and degrades performance. We address this challenge by introducing a sliced Wasserstein loss function for distribution-aware calibration in ultra-low-bit post-training quantization. The proposed loss aligns the output distributions of full-precision and quantized models under random linear projections, complementing standard mean-squared error loss without adding any computational overhead during inference. Our proposed loss function can be incorporated with any post-training quantization framework that has a retraining…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Natural Language Processing Techniques · Big Data and Digital Economy