Pruning Large Language Models by Identifying and Preserving Functional Networks
Yiheng Liu, Junhao Ning, Sichen Xia, Xiaohui Gao, Ning Qiang, Bao Ge, Junwei Han, Xintao Hu

TL;DR
This paper introduces a novel structured pruning method for large language models that preserves functional neural networks, maintaining model performance while reducing size and computational requirements.
Contribution
It proposes a new approach to pruning LLMs by identifying and preserving functional networks, inspired by brain neural network analysis, improving pruning effectiveness.
Findings
Successfully identifies functional networks in LLMs
Preserves key neurons within networks during pruning
Achieves efficient model compression with maintained performance
Abstract
Structured pruning is one of the representative techniques for compressing large language models (LLMs) to reduce GPU memory consumption and accelerate inference speed. It offers significant practical value in improving the efficiency of LLMs in real-world applications. Current structured pruning methods typically rely on assessment of the importance of the structure units and pruning the units with less importance. Most of them overlooks the interaction and collaboration among artificial neurons that are crucial for the functionalities of LLMs, leading to a disruption in the macro functional architecture of LLMs and consequently a pruning performance degradation. Inspired by the inherent similarities between artificial neural networks and functional neural networks in the human brain, we alleviate this challenge and propose to prune LLMs by identifying and preserving functional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
