MaskPrune: Mask-based LLM Pruning for Layer-wise Uniform Structures

Jiayu Qin; Jianchao Tan; Kefeng Zhang; Xunliang Cai; Wei Wang

arXiv:2502.14008·cs.CL·February 21, 2025

MaskPrune: Mask-based LLM Pruning for Layer-wise Uniform Structures

Jiayu Qin, Jianchao Tan, Kefeng Zhang, Xunliang Cai, Wei Wang

PDF

Open Access

TL;DR

MaskPrune introduces a mask-based pruning method for large language models that maintains uniform layer-wise structures, improving inference efficiency without sacrificing performance.

Contribution

The paper proposes a novel minimax optimization-based masking learning paradigm to achieve uniform structured pruning in LLMs, addressing heterogeneity issues in prior methods.

Findings

01

Maintains high performance with uniform pruned structures.

02

Outperforms existing state-of-the-art pruning methods.

03

Enhances inference efficiency through structured sparsity.

Abstract

The remarkable performance of large language models (LLMs) in various language tasks has attracted considerable attention. However, the ever-increasing size of these models presents growing challenges for deployment and inference. Structured pruning, an effective model compression technique, is gaining increasing attention due to its ability to enhance inference efficiency. Nevertheless, most previous optimization-based structured pruning methods sacrifice the uniform structure across layers for greater flexibility to maintain performance. The heterogeneous structure hinders the effective utilization of off-the-shelf inference acceleration techniques and impedes efficient configuration for continued training. To address this issue, we propose a novel masking learning paradigm based on minimax optimization to obtain the uniform pruned structure by optimizing the masks under sparsity…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvancements in Photolithography Techniques · Advanced Surface Polishing Techniques

MethodsSoftmax · Attention Is All You Need · Pruning