LLM-Barber: Block-Aware Rebuilder for Sparsity Mask in One-Shot for Large Language Models

Yupeng Su; Ziyi Guan; Xiaoqun Liu; Tianlai Jin; Dongkuan Wu; Zhengfei Chen; Graziano Chesi; Ngai Wong; Hao Yu

arXiv:2408.10631·cs.LG·December 11, 2025

LLM-Barber: Block-Aware Rebuilder for Sparsity Mask in One-Shot for Large Language Models

Yupeng Su, Ziyi Guan, Xiaoqun Liu, Tianlai Jin, Dongkuan Wu, Zhengfei Chen, Graziano Chesi, Ngai Wong, Hao Yu

PDF

Open Access 1 Repo

TL;DR

LLM-Barber introduces a novel one-shot pruning method for large language models that optimizes sparsity masks efficiently without retraining, maintaining high performance and reducing computational costs.

Contribution

The paper presents the first use of weight-gradient product as a pruning metric for LLMs, enabling efficient one-shot pruning with block-aware error optimization.

Findings

01

Prunes LLaMA and OPT models (7B-13B) in 30 minutes on a single GPU.

02

Achieves state-of-the-art perplexity and zero-shot performance.

03

Reduces computational complexity compared to second-order methods.

Abstract

Large language models (LLMs) have seen substantial growth, necessitating efficient model pruning techniques. Existing post-training pruning methods primarily measure weight importance in converged dense models, often overlooking changes in weight significance during the pruning process, leading to performance degradation. To address this issue, we present LLM-Barber (Block-Aware Rebuilder for Sparsity Mask in One-Shot), a novel one-shot pruning framework that rebuilds the sparsity mask of pruned models without any retraining or weight reconstruction. LLM-Barber incorporates block-aware error optimization across Self-Attention and MLP blocks, facilitating global performance optimization. We are the first to employ the product of weights and gradients as a pruning metric in the context of LLM post-training pruning. This enables accurate identification of weight importance in massive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yupengsu/llm-barber
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsLLaMA · Pruning · OPT · Focus