SEPTQ: A Simple and Effective Post-Training Quantization Paradigm for Large Language Models

Han Liu; Haotian Gao; Xiaotong Zhang; Changya Li; Feng Zhang; Wei Wang; Fenglong Ma; Hong Yu

arXiv:2604.10091·cs.CL·April 14, 2026

SEPTQ: A Simple and Effective Post-Training Quantization Paradigm for Large Language Models

Han Liu, Haotian Gao, Xiaotong Zhang, Changya Li, Feng Zhang, Wei Wang, Fenglong Ma, Hong Yu

PDF

TL;DR

SEPTQ introduces a simple, two-step post-training quantization method for large language models that improves efficiency and performance, especially at low-bit levels, without retraining.

Contribution

It proposes a novel, straightforward quantization paradigm that simplifies existing methods and enhances low-bit quantization performance for large language models.

Findings

01

SEPTQ outperforms existing methods in low-bit quantization scenarios.

02

The method maintains high model quality with reduced computational complexity.

03

Experimental results show significant improvements across various datasets and model sizes.

Abstract

Large language models (LLMs) have shown remarkable performance in various domains, but they are constrained by massive computational and storage costs. Quantization, an effective technique for compressing models to fit resource-limited devices while preserving generative quality, encompasses two primary methods: quantization aware training (QAT) and post-training quantization (PTQ). QAT involves additional retraining or fine-tuning, thus inevitably resulting in high training cost and making it unsuitable for LLMs. Consequently, PTQ has become the research hotspot in recent quantization methods. However, existing PTQ methods usually rely on various complex computation procedures and suffer from considerable performance degradation under low-bit quantization settings. To alleviate the above issues, we propose a simple and effective post-training quantization paradigm for LLMs, named…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.