LLM-Rank: A Graph Theoretical Approach to Pruning Large Language Models
David Hoffmann, Kailash Budhathoki, Matthaeus Kleindessner

TL;DR
This paper introduces MLPRank and LLMRank, novel graph-theoretic pruning methods for large language models that improve efficiency while maintaining high accuracy, by applying centrality measures to model representations.
Contribution
The paper presents a new pruning approach using graph centrality measures, extending it to transformer models, and demonstrates significant accuracy retention improvements over existing baselines.
Findings
MLPRank achieves 6.09% higher accuracy retention than baselines.
LLMRank achieves 13.42% higher accuracy retention than baselines.
Both methods effectively reduce computational and memory requirements.
Abstract
The evolving capabilities of large language models are accompanied by growing sizes and deployment costs, necessitating effective inference optimisation techniques. We propose a novel pruning method utilising centrality measures from graph theory, reducing both the computational requirements and the memory footprint of these models. Specifically, we devise a method for creating a weighted directed acyclical graph representation of multilayer perceptrons to which we apply a modified version of the weighted PageRank centrality measure to compute node importance scores. In combination with uniform pruning this leads to structured sparsity. We call this pruning method MLPRank. Furthermore we introduce an extension to decoder-only transformer models and call it LLMRank. For both variants we demonstrate a strong performance. With MLPRank on average leading to 6.09 % higher accuracy retention…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
MethodsLarge Language Model pruning based on weighted PageRank · Pruning
