Fitting Is Not Enough: Smoothness in Extremely Quantized LLMs

Yuzhuang Xu; Xu Han; Yuxuan Li; Pengzhan Li; Wanxiang Che

arXiv:2605.08894·cs.CL·May 18, 2026

Fitting Is Not Enough: Smoothness in Extremely Quantized LLMs

Yuzhuang Xu, Xu Han, Yuxuan Li, Pengzhan Li, Wanxiang Che

PDF

1 Repo

TL;DR

This paper reveals that extremely quantized large language models suffer from smoothness degradation affecting generation quality, and proposes a smoothness-preserving approach to improve performance beyond numerical accuracy.

Contribution

It introduces the importance of smoothness preservation in extreme quantization of LLMs and demonstrates its benefits over traditional accuracy-focused methods.

Findings

01

Smoothness degradation worsens as bit-width decreases.

02

Preserving smoothness improves generation quality beyond numerical accuracy.

03

A simple smoothness-preserving principle enhances quantized LLM performance.

Abstract

Large language models (LLMs) achieve strong performance but incur high deployment costs, motivating extremely low-bit but lossy quantization. Existing quantization algorithms mainly focus on improving the numerical accuracy of forward computation to eliminate performance degradation. In this paper, we show that extremely quantized LLMs suffer from systematic smoothness degradation beyond numerical precision loss. Through a smoothness proxy, we observe that such degradation becomes increasingly severe as the quantization bit-width decreases. Furthermore, based on sequence neighborhood modeling, we find that quantized models exhibit a rapid reduction of effective token candidates within the prediction neighborhood, which directly leads to a sparser decoding tree and degraded generation quality. To validate it, we introduce a simple smoothness-preserving principle in both post-training…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xuyuzhuang11/FINE
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.