On Pruning State-Space LLMs
Tamer Ghattas, Michael Hassid, Roy Schwartz

TL;DR
This paper explores pruning techniques for state-space models (SSMs) used in large language models, demonstrating that some pruning methods maintain performance while others cause degradation, thus offering insights into model efficiency improvements.
Contribution
It adapts various pruning methods to SSM-based LLMs and evaluates their robustness, providing new understanding of pruning effects on these models.
Findings
WANDA pruning maintains performance well
Other pruning methods cause performance degradation
SSMs show robustness to certain pruning techniques
Abstract
Recent work proposed state-space models (SSMs) as an efficient alternative to transformer-based LLMs. Can these models be pruned to further reduce their computation costs? We adapt several pruning methods to the SSM structure, and apply them to four SSM-based LLMs across multiple tasks. We find that such models are quite robust to some pruning methods (e.g. WANDA), while using other methods lead to fast performance degradation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTime Series Analysis and Forecasting · Parallel Computing and Optimization Techniques · Advanced Memory and Neural Computing
