SEPTQ: A Simple and Effective Post-Training Quantization Paradigm for Large Language Models
Han Liu, Haotian Gao, Xiaotong Zhang, Changya Li, Feng Zhang, Wei Wang, Fenglong Ma, Hong Yu

TL;DR
SEPTQ introduces a simple, two-step post-training quantization method for large language models that improves efficiency and performance, especially at low-bit levels, without retraining.
Contribution
It proposes a novel, straightforward quantization paradigm that simplifies existing methods and enhances low-bit quantization performance for large language models.
Findings
SEPTQ outperforms existing methods in low-bit quantization scenarios.
The method maintains high model quality with reduced computational complexity.
Experimental results show significant improvements across various datasets and model sizes.
Abstract
Large language models (LLMs) have shown remarkable performance in various domains, but they are constrained by massive computational and storage costs. Quantization, an effective technique for compressing models to fit resource-limited devices while preserving generative quality, encompasses two primary methods: quantization aware training (QAT) and post-training quantization (PTQ). QAT involves additional retraining or fine-tuning, thus inevitably resulting in high training cost and making it unsuitable for LLMs. Consequently, PTQ has become the research hotspot in recent quantization methods. However, existing PTQ methods usually rely on various complex computation procedures and suffer from considerable performance degradation under low-bit quantization settings. To alleviate the above issues, we propose a simple and effective post-training quantization paradigm for LLMs, named…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
