SiLQ: Simple Large Language Model Quantization-Aware Training

Steven K. Esser; Jeffrey L. McKinstry; Deepika Bablani; Rathinakumar Appuswamy; Dharmendra S. Modha

arXiv:2507.16933·cs.LG·July 24, 2025

SiLQ: Simple Large Language Model Quantization-Aware Training

Steven K. Esser, Jeffrey L. McKinstry, Deepika Bablani, Rathinakumar Appuswamy, Dharmendra S. Modha

PDF

Open Access

TL;DR

SiLQ introduces a straightforward quantization-aware training method for large language models that significantly improves accuracy with minimal additional training cost, compatible with various architectures and deployment scenarios.

Contribution

The paper presents a simple, end-to-end quantization-aware training approach that outperforms existing methods with minimal training overhead and broad applicability.

Findings

01

Outperforms leading quantization methods on multiple benchmarks.

02

Requires less than 0.1% additional training budget.

03

Compatible with various model architectures and deployment setups.

Abstract

Large language models can be quantized to reduce inference time latency, model size, and energy consumption, thereby delivering a better user experience at lower cost. A challenge exists to deliver quantized models with minimal loss of accuracy in reasonable time, and in particular to do so without requiring mechanisms incompatible with specialized inference accelerators. Here, we demonstrate a simple, end-to-end quantization-aware training approach that, with an increase in total model training budget of less than 0.1%, outperforms the leading published quantization methods by large margins on several modern benchmarks, with both base and instruct model variants. The approach easily generalizes across different model architectures, can be applied to activations, cache, and weights, and requires the introduction of no additional operations to the model other than the quantization itself.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling