Improved Methods for Model Pruning and Knowledge Distillation

Wei Jiang; Anying Fu; Youling Zhang

arXiv:2505.14052·cs.CL·May 21, 2025

Improved Methods for Model Pruning and Knowledge Distillation

Wei Jiang, Anying Fu, Youling Zhang

PDF

Open Access

TL;DR

This paper introduces MAMA Pruning, a novel method for reducing large language models' size and complexity while maintaining performance, using weight, bias, and reward-based indicators during pruning.

Contribution

The paper presents MAMA Pruning, an improved pruning technique that effectively reduces model size and computational load with minimal performance loss, outperforming existing methods.

Findings

01

MAMA Pruning maintains performance at high pruning levels.

02

It outperforms state-of-the-art pruning methods.

03

Effective across various NLP tasks.

Abstract

Model pruning is a performance optimization technique for large language models like R1 or o3-mini. However, existing pruning methods often lead to significant performance degradation or require extensive retraining and fine-tuning. This technique aims to identify and remove neurons, connections unlikely leading to the contribution during the human-computer interaction phase. Our goal is to obtain a much smaller and faster knowledge distilled model that can quickly generate content almost as good as those of the unpruned ones. We propose MAMA Pruning, short for Movement and Magnitude Analysis, an improved pruning method that effectively reduces model size and computational complexity while maintaining performance comparable to the original unpruned model even at extreme pruned levels. The improved method is based on weights, bias fixed in the pre-training phase and GRPO rewards verified…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Computational Techniques and Applications

MethodsPruning