TEQ: Trainable Equivalent Transformation for Quantization of LLMs
Wenhua Cheng, Yiyang Cai, Kaokao Lv, Haihao Shen

TL;DR
TEQ introduces a lightweight, trainable transformation that enables low-precision quantization of large language models without sacrificing accuracy or adding inference overhead, matching state-of-the-art performance.
Contribution
The paper proposes TEQ, a novel trainable equivalent transformation that preserves FP32 output precision during low-bit quantization of LLMs, requiring minimal training and no additional inference cost.
Findings
Achieves state-of-the-art quantization performance on LLMs.
Requires only 1K training steps and less than 0.1% of model parameters.
Compatible with other methods for enhanced performance.
Abstract
As large language models (LLMs) become more prevalent, there is a growing need for new and improved quantization methods that can meet the computationalast layer demands of these modern architectures while maintaining the accuracy. In this paper, we present TEQ, a trainable equivalent transformation that preserves the FP32 precision of the model output while taking advantage of low-precision quantization, especially 3 and 4 bits weight-only quantization. The training process is lightweight, requiring only 1K steps and fewer than 0.1 percent of the original model's trainable parameters. Furthermore, the transformation does not add any computational overhead during inference. Our results are on-par with the state-of-the-art (SOTA) methods on typical LLMs. Our approach can be combined with other methods to achieve even better performance. The code is available at…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
