Frustratingly Easy Task-aware Pruning for Large Language Models
Yuanhe Tian, Junjie Liu, Xican Yang, Haishan Ye, Yan Song

TL;DR
This paper introduces a simple task-aware pruning method for large language models that preserves domain-specific capabilities during compression, outperforming traditional magnitude-based pruning techniques.
Contribution
It extends conventional pruning algorithms by incorporating task-specific feature distributions, enabling better preservation of specialized abilities in pruned LLMs.
Findings
Outperforms baseline pruning methods on benchmark tasks.
Effectively preserves task-specific performance after pruning.
Seamlessly integrates with various foundation pruning techniques.
Abstract
Pruning provides a practical solution to reduce the resources required to run large language models (LLMs) to benefit from their effective capabilities as well as control their cost for training and inference. Research on LLM pruning often ranks the importance of LLM parameters using their magnitudes and calibration-data activations and removes (or masks) the less important ones, accordingly reducing LLMs' size. However, these approaches primarily focus on preserving the LLM's ability to generate fluent sentences, while neglecting performance on specific domains and tasks. In this paper, we propose a simple yet effective pruning approach for LLMs that preserves task-specific capabilities while shrinking their parameter space. We first analyze how conventional pruning minimizes loss perturbation under general-domain calibration and extend this formulation by incorporating task-specific…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
