Self-Data Distillation for Recovering Quality in Pruned Large Language Models

Vithursan Thangarasa; Ganesh Venkatesh; Mike Lasby; Nish Sinnadurai; Sean Lie

arXiv:2410.09982·cs.LG·May 13, 2025

Self-Data Distillation for Recovering Quality in Pruned Large Language Models

Vithursan Thangarasa, Ganesh Venkatesh, Mike Lasby, Nish Sinnadurai, Sean Lie

PDF

Open Access

TL;DR

This paper introduces self-data distillation to improve the quality of pruned large language models, outperforming standard fine-tuning methods and enhancing efficiency in inference tasks.

Contribution

It proposes a novel self-data distillation approach that preserves model knowledge during pruning, reducing quality loss and catastrophic forgetting.

Findings

01

Self-data distillation outperforms standard supervised fine-tuning.

02

Retains 91.2% of original accuracy after pruning, compared to 81.7%.

03

Reduces FLOPs by 16.3%, improving inference efficiency.

Abstract

Large language models have driven significant progress in natural language processing, but their deployment requires substantial compute and memory resources. As models scale, compression techniques become essential for balancing model quality with computational efficiency. Structured pruning, which removes less critical components of the model, is a promising strategy for reducing complexity. However, one-shot pruning often results in significant quality degradation, particularly in tasks requiring multi-step reasoning. To recover lost quality, supervised fine-tuning (SFT) is commonly applied, but it can lead to catastrophic forgetting by shifting the model's learned data distribution. Therefore, addressing the degradation from both pruning and SFT is essential to preserve the original model's quality. In this work, we utilize self-data distilled fine-tuning to address these…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsPruning · Balanced Selection · Shrink and Fine-Tune