LEAP: Learnable End-to-End Adaptive Pruning of Large Language Models

Mohammad Mozaffari; Younes Hourri; Mohammad Rastegari; Mahyar Najibi

arXiv:2605.17289·cs.LG·May 19, 2026

LEAP: Learnable End-to-End Adaptive Pruning of Large Language Models

Mohammad Mozaffari, Younes Hourri, Mohammad Rastegari, Mahyar Najibi

PDF

TL;DR

LEAP introduces a learnable, end-to-end unstructured pruning method for large language models that improves accuracy at high sparsity levels by using a novel Bernoulli-via-Gumbel-sigmoid relaxation.

Contribution

LEAP replaces intractable parameterizations with a scalable relaxation, enabling effective end-to-end unstructured pruning of large language models.

Findings

01

LEAP improves six-task average zero-shot accuracy by +2.59 points over ADMM.

02

LEAP is effective across five LLM families from 0.5B to 8B parameters.

03

LEAP achieves 50% and 60% sparsity with improved accuracy.

Abstract

Unstructured sparsity is now natively accelerated by recent GPU kernels and dataflow hardware, shifting the bottleneck from inference execution to the pruning algorithm. State-of-the-art methods for unstructured LLM pruning are layer-wise surrogates derived from the Optimal Brain Surgeon principle, and they sacrifice end-to-end accuracy, especially under aggressive sparsity. End-to-end alternatives such as MaskLLM and PATCH show that learnable masks can close this gap, but their categorical-over-patterns parameterization scales with the number of valid masks per row and does not port to the unstructured setting. We introduce LEAP, which replaces this intractable parameterization with a per-weight Bernoulli-via-Gumbel- sigmoid relaxation that makes end-to-end unstructured mask learning tractable. Across five LLM families from 0.5B to 8B parameters at 50% and 60% sparsity, LEAP improves…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.