FinerCut: Finer-grained Interpretable Layer Pruning for Large Language Models
Yang Zhang, Yawei Li, Xinpeng Wang, Qianli Shen, Barbara Plank, Bernd, Bischl, Mina Rezaei, Kenji Kawaguchi

TL;DR
FinerCut introduces a fine-grained, interpretable layer pruning method for large language models that retains high performance while significantly reducing model size without fine-tuning.
Contribution
It proposes a novel layer-level pruning approach considering individual self-attention and FFN layers, enabling effective model compression and interpretability.
Findings
Retains 90% of Llama3-8B performance with 25% layers removed
Removes 42% of self-attention layers in Llama3-70B while preserving 99% performance
Provides insights into layer pruning patterns and behaviors
Abstract
Overparametrized transformer networks are the state-of-the-art architecture for Large Language Models (LLMs). However, such models contain billions of parameters making large compute a necessity, while raising environmental concerns. To address these issues, we propose FinerCut, a new form of fine-grained layer pruning, which in contrast to prior work at the transformer block level, considers all self-attention and feed-forward network (FFN) layers within blocks as individual pruning candidates. FinerCut prunes layers whose removal causes minimal alternation to the model's output -- contributing to a new, lean, interpretable, and task-agnostic pruning method. Tested across 9 benchmarks, our approach retains 90% performance of Llama3-8B with 25% layers removed, and 95% performance of Llama3-70B with 30% layers removed, all without fine-tuning or post-pruning reconstruction. Strikingly,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis
MethodsPruning
