Breaking Expert Knowledge Limits: Self-Pruning for Large Language Models
Haidong Kang, Lihong Lin, Enneng Yang, Hongning Dai, Hao Wang

TL;DR
This paper introduces AutoPrune, a novel method enabling large language models to automatically design their own pruning algorithms, overcoming expert knowledge limits and addressing outlier value issues for improved performance.
Contribution
AutoPrune is the first approach allowing LLMs to self-prune without expert-designed algorithms, utilizing GCoT for prompt optimization and SDSA for adaptive sparsity, significantly enhancing pruning performance.
Findings
AutoPrune outperforms state-of-the-art pruning methods.
GCoT improves reasoning in pruning algorithm design.
SDSA mitigates performance loss at high pruning ratios.
Abstract
Large language models (LLMs) have achieved remarkable performance on a wide range of tasks, hindering real-world deployment due to their massive size. Existing pruning methods (e.g., Wanda) tailored for LLMs rely heavily on manual design pruning algorithms, thereby leading to \textit{huge labor costs} and \textit{requires expert knowledge}. Furthermore, we are the first to identify the serious \textit{outlier value issue} behind dramatic performance degradation under high pruning ratios that are caused by uniform sparsity, raising an additional concern about how to design adaptive pruning sparsity ideal for LLMs. Can LLMs prune by themselves? In this work, we introduce an affirmative answer by proposing a novel pruning method called \textbf{AutoPrune}, which first overcomes expert knowledge limits by leveraging LLMs to design optimal pruning algorithms for themselves automatically…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques
