From Local to Global: Revisiting Structured Pruning Paradigms for Large Language Models

Ziyan Wang; Enmao Diao; Qi Le; Pu Wang; Minwoo Lee; Shu-ping Yeh; Evgeny Stupachenko; Hao Feng; Li Yang

arXiv:2510.18030·cs.CL·April 29, 2026

From Local to Global: Revisiting Structured Pruning Paradigms for Large Language Models

Ziyan Wang, Enmao Diao, Qi Le, Pu Wang, Minwoo Lee, Shu-ping Yeh, Evgeny Stupachenko, Hao Feng, Li Yang

PDF

1 Repo

TL;DR

This paper introduces GISP, a global structured pruning method for large language models that improves efficiency and downstream task performance without intermediate fine-tuning.

Contribution

The paper presents GISP, a novel iterative, importance-based global pruning approach that stabilizes accuracy at high sparsity and supports task-specific objectives.

Findings

01

GISP reduces perplexity on WikiText-2 across multiple LLMs.

02

GISP improves downstream accuracy, especially at 40-50% sparsity.

03

Task-specific calibration enhances accuracy on decision tasks.

Abstract

Structured pruning is a practical approach to deploying large language models (LLMs) efficiently, as it yields compact, hardware-friendly architectures. However, the dominant local paradigm is task-agnostic: by optimizing layer-wise reconstruction rather than task objectives, it tends to preserve perplexity or generic zero-shot behavior but fails to capitalize on modest task-specific calibration signals, often yielding limited downstream gains. We revisit global structured pruning and present GISP, Global Iterative Structured Pruning, a post-training method that removes attention heads and MLP channels using first-order, loss-based important scores aggregated at the structure level with block-wise normalization. Built on this global importance metric, GISP adopts an iterative schedule, rather than one-shot pruning, stabilizes accuracy at higher sparsity, and mitigates perplexity…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

uncc-efficient-ai/GISP
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.