Pruning General Large Language Models into Customized Expert Models
Yirao Zhao, Guizhen Chen, Kenji Kawaguchi, Lidong Bing, Wenxuan Zhang

TL;DR
This paper introduces Cus-Prun, a novel pruning method that efficiently creates compact expert language models tailored to specific domains, tasks, or languages without post-training, outperforming existing approaches.
Contribution
The paper presents Cus-Prun, a new pruning technique that directly produces lightweight expert models along language, domain, and task dimensions without additional training.
Findings
Cus-Prun outperforms existing pruning methods in preserving model capabilities.
It effectively creates expert models tailored to specific scenarios.
The method works across various model sizes and families.
Abstract
Large language models (LLMs) have revolutionized natural language processing, yet their substantial model sizes often require substantial computational resources. To preserve computing resources and accelerate inference speed, it is crucial to prune redundant parameters, especially for experienced users who often need compact expert models tailored to specific downstream scenarios. However, most existing pruning methods focus on preserving the model's general capabilities, often requiring extensive post-training or suffering from degraded performance due to coarse-grained pruning. In this work, we design a tom ing method () to prune a large general model into a smaller lightweight expert model, which is positioned along the "language", "domain" and "task" dimensions. By identifying and pruning irrelevant neurons of each dimension,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Computational and Text Analysis Methods
