MaskPrune: Mask-based LLM Pruning for Layer-wise Uniform Structures
Jiayu Qin, Jianchao Tan, Kefeng Zhang, Xunliang Cai, Wei Wang

TL;DR
MaskPrune introduces a mask-based pruning method for large language models that maintains uniform layer-wise structures, improving inference efficiency without sacrificing performance.
Contribution
The paper proposes a novel minimax optimization-based masking learning paradigm to achieve uniform structured pruning in LLMs, addressing heterogeneity issues in prior methods.
Findings
Maintains high performance with uniform pruned structures.
Outperforms existing state-of-the-art pruning methods.
Enhances inference efficiency through structured sparsity.
Abstract
The remarkable performance of large language models (LLMs) in various language tasks has attracted considerable attention. However, the ever-increasing size of these models presents growing challenges for deployment and inference. Structured pruning, an effective model compression technique, is gaining increasing attention due to its ability to enhance inference efficiency. Nevertheless, most previous optimization-based structured pruning methods sacrifice the uniform structure across layers for greater flexibility to maintain performance. The heterogeneous structure hinders the effective utilization of off-the-shelf inference acceleration techniques and impedes efficient configuration for continued training. To address this issue, we propose a novel masking learning paradigm based on minimax optimization to obtain the uniform pruned structure by optimizing the masks under sparsity…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvancements in Photolithography Techniques · Advanced Surface Polishing Techniques
MethodsSoftmax · Attention Is All You Need · Pruning
