Deterministic Differentiable Structured Pruning for Large Language Models

Weiyu Huang; Pengle Zhang; Xiaolu Zhang; Jun Zhou; Jun Zhu; Jianfei Chen

arXiv:2603.08065·cs.LG·May 12, 2026

Deterministic Differentiable Structured Pruning for Large Language Models

Weiyu Huang, Pengle Zhang, Xiaolu Zhang, Jun Zhou, Jun Zhu, Jianfei Chen

PDF

TL;DR

This paper introduces Deterministic Differentiable Pruning (DDP), a novel method for structured pruning of large language models that improves efficiency and reduces train-test mismatch by directly optimizing a deterministic surrogate of the l0 norm.

Contribution

The paper proposes DDP, a deterministic mask optimization approach for structured pruning, outperforming stochastic methods in efficiency and accuracy on large language models.

Findings

01

Achieves as low as 1% performance loss at 20% sparsity.

02

Outperforms previous methods in structured pruning.

03

Demonstrates end-to-end inference speedups in deployment.

Abstract

Structured pruning reduces LLM inference cost by removing low-importance architectural components. This can be viewed as learning a multiplicative gate for each component under an l0 sparsity constraint. Due to the discreteness of the l0 norm, prior work typically adopts stochastic hard-concrete relaxations to enable differentiable optimization; however, this stochasticity can introduce a train--test mismatch when sampled masks are discretized for deployment and restricts masks to a bounded, near-binary range. To address this, we propose Deterministic Differentiable Pruning (DDP), a mask-only optimization method that eliminates stochasticity by directly optimizing a deterministic soft surrogate of the discrete l0 objective. Compared with prior approaches, DDP offers greater expressiveness, reduced train--test mismatch, and faster convergence. We apply our method to several dense and MoE…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.