Prune, Update and Trim: Robust Structured Pruning for Large Language Models

Diego Coello de Portugal Mecke; Tom Hanika; Lars Schmidth-Thieme

arXiv:2605.18331·cs.LG·May 19, 2026

Prune, Update and Trim: Robust Structured Pruning for Large Language Models

Diego Coello de Portugal Mecke, Tom Hanika, Lars Schmidth-Thieme

PDF

1 Repo

TL;DR

Putri is a novel structured pruning method for large language models that updates weights, prunes sequentially, and removes individual attention-heads, achieving state-of-the-art performance especially at high sparsity levels.

Contribution

It introduces Putri, a simple yet effective post-training pruning technique that outperforms existing methods on large language models across various sparsity ranges.

Findings

01

Putri achieves state-of-the-art performance in structured pruning of LLMs.

02

It effectively prunes models at extreme sparsity ratios.

03

The method generalizes well across different models and datasets.

Abstract

Large Language Models (LLMs) have experienced significant growth and development in recent years. However, performing inference on LLMs remains costly, especially for long-context inference or in resource-constrained devices. This motivates the development of new post-training pruning (PTP) methods. These methods reduce LLMs' requirements by removing a substantial part of the model's parameters. The discarded weights are selected depending on their impact on the models performance. Current PTP methods prune the models by removing the less informative hidden nodes from the FFN layers, and the least important attention layers. We propose Putri, a PTP method that introduces three changes to the State- of-the-art. First, we update the un-pruned weights of the FFN to compensate for the introduced pruning error. Second, the FFN layers are pruned sequentially, taking into account the updates…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Coello-dev/Putri
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.