MaskPro: Linear-Space Probabilistic Learning for Strict (N:M)-Sparsity on LLMs

Yan Sun; Qixin Zhang; Zhiyuan Yu; Xikun Zhang; Li Shen; Dacheng Tao

arXiv:2506.12876·cs.LG·May 14, 2026

MaskPro: Linear-Space Probabilistic Learning for Strict (N:M)-Sparsity on LLMs

Yan Sun, Qixin Zhang, Zhiyuan Yu, Xikun Zhang, Li Shen, Dacheng Tao

PDF

1 Repo 1 Video

TL;DR

MaskPro introduces a probabilistic learning framework for efficient (N:M)-sparsity in large language models, improving inference speed and memory use with robust training methods.

Contribution

It proposes a novel linear-space probabilistic approach for (N:M)-sparsity, addressing errors and training costs of prior methods.

Findings

01

MaskPro achieves superior sparsity performance in LLMs.

02

The method demonstrates excellent scalability and robustness.

03

Extensive experiments validate its effectiveness.

Abstract

The rapid scaling of large language models~(LLMs) has made inference efficiency a primary bottleneck in the practical deployment. To address this, semi-structured sparsity offers a promising solution by strategically retaining $N$ elements out of every $M$ weights, thereby enabling hardware-friendly acceleration and reduced memory. However, existing (N:M)-compatible approaches typically fall into two categories: rule-based layerwise greedy search, which suffers from considerable errors, and gradient-driven combinatorial learning, which incurs prohibitive training costs. To tackle these challenges, we propose a novel linear-space probabilistic framework named MaskPro, which aims to learn a prior categorical distribution for every $M$ consecutive weights and subsequently leverages this distribution to generate the (N:M)-sparsity throughout an $N$ -way sampling without replacement.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

woodenchild95/Maskpro.git
github

Videos

MaskPro: Linear-Space Probabilistic Learning for Strict (N:M)-Sparsity on LLMs· slideslive