Self-Data Distillation for Recovering Quality in Pruned Large Language Models
Vithursan Thangarasa, Ganesh Venkatesh, Mike Lasby, Nish Sinnadurai, Sean Lie

TL;DR
This paper introduces self-data distillation to improve the quality of pruned large language models, outperforming standard fine-tuning methods and enhancing efficiency in inference tasks.
Contribution
It proposes a novel self-data distillation approach that preserves model knowledge during pruning, reducing quality loss and catastrophic forgetting.
Findings
Self-data distillation outperforms standard supervised fine-tuning.
Retains 91.2% of original accuracy after pruning, compared to 81.7%.
Reduces FLOPs by 16.3%, improving inference efficiency.
Abstract
Large language models have driven significant progress in natural language processing, but their deployment requires substantial compute and memory resources. As models scale, compression techniques become essential for balancing model quality with computational efficiency. Structured pruning, which removes less critical components of the model, is a promising strategy for reducing complexity. However, one-shot pruning often results in significant quality degradation, particularly in tasks requiring multi-step reasoning. To recover lost quality, supervised fine-tuning (SFT) is commonly applied, but it can lead to catastrophic forgetting by shifting the model's learned data distribution. Therefore, addressing the degradation from both pruning and SFT is essential to preserve the original model's quality. In this work, we utilize self-data distilled fine-tuning to address these…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
MethodsPruning · Balanced Selection · Shrink and Fine-Tune
