AWP: Activation-Aware Weight Pruning and Quantization with Projected Gradient Descent

Jing Liu; Toshiaki Koike-Akino; Ye Wang; Hassan Mansour; Matthew Brand

arXiv:2506.10205·cs.LG·December 2, 2025

AWP: Activation-Aware Weight Pruning and Quantization with Projected Gradient Descent

Jing Liu, Toshiaki Koike-Akino, Ye Wang, Hassan Mansour, Matthew Brand

PDF

Open Access

TL;DR

This paper introduces AWP, a unified activation-aware weight pruning and quantization method for large language models, leveraging projected gradient descent to improve compression efficiency and performance.

Contribution

It proposes a novel unified approach combining pruning and quantization with theoretical convergence guarantees, outperforming existing methods.

Findings

01

AWP outperforms state-of-the-art pruning and quantization techniques.

02

The method provides theoretical convergence guarantees.

03

Experimental results validate the effectiveness of AWP on LLMs.

Abstract

To address the enormous size of Large Language Models (LLMs), model compression methods, such as quantization and pruning, are often deployed, especially on edge devices. In this work, we focus on layer-wise post-training quantization and pruning. Drawing connections between activation-aware weight pruning and sparse approximation problems, and motivated by the success of Iterative Hard Thresholding (IHT), we propose a unified method for Activation-aware Weight pruning and quantization via Projected gradient descent (AWP). Our experiments demonstrate that AWP outperforms state-of-the-art LLM pruning and quantization methods. Theoretical convergence guarantees of the proposed method for pruning are also provided.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis

MethodsFocus · Pruning