Athena: Efficient Block-Wise Post-Training Quantization for Large   Language Models Using Second-Order Matrix Derivative Information

Yanshu Wang; Wenyang He; Tong Yang

arXiv:2405.17470·cs.LG·May 29, 2024

Athena: Efficient Block-Wise Post-Training Quantization for Large Language Models Using Second-Order Matrix Derivative Information

Yanshu Wang, Wenyang He, Tong Yang

PDF

Open Access

TL;DR

Athena introduces a block-wise post-training quantization method for large language models that uses second-order derivative information to optimize compression while preserving model accuracy.

Contribution

It presents a novel quantization algorithm leveraging second-order matrix derivatives for efficient compression of LLMs without retraining.

Findings

01

Achieves significant model size reduction.

02

Maintains high accuracy post-quantization.

03

Outperforms traditional uniform quantization methods.

Abstract

Large Language Models (LLMs) have significantly advanced natural language processing tasks such as machine translation, text generation, and sentiment analysis. However, their large size, often consisting of billions of parameters, poses challenges for storage, computation, and deployment, particularly in resource-constrained environments like mobile devices and edge computing platforms. Effective compression and quantization techniques are crucial for addressing these issues, reducing memory footprint and computational requirements without significantly compromising performance. Traditional methods that uniformly map parameters to compressed spaces fail to account for the uneven distribution of parameters, leading to substantial accuracy loss. In this work, we propose Athena, a novel algorithm for efficient block-wise post-training quantization of LLMs. Athena leverages Second-Order…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Speech Recognition and Synthesis · Natural Language Processing Techniques