LLM-Rank: A Graph Theoretical Approach to Pruning Large Language Models

David Hoffmann; Kailash Budhathoki; Matthaeus Kleindessner

arXiv:2410.13299·cs.LG·December 2, 2024

LLM-Rank: A Graph Theoretical Approach to Pruning Large Language Models

David Hoffmann, Kailash Budhathoki, Matthaeus Kleindessner

PDF

Open Access 1 Repo

TL;DR

This paper introduces MLPRank and LLMRank, novel graph-theoretic pruning methods for large language models that improve efficiency while maintaining high accuracy, by applying centrality measures to model representations.

Contribution

The paper presents a new pruning approach using graph centrality measures, extending it to transformer models, and demonstrates significant accuracy retention improvements over existing baselines.

Findings

01

MLPRank achieves 6.09% higher accuracy retention than baselines.

02

LLMRank achieves 13.42% higher accuracy retention than baselines.

03

Both methods effectively reduce computational and memory requirements.

Abstract

The evolving capabilities of large language models are accompanied by growing sizes and deployment costs, necessitating effective inference optimisation techniques. We propose a novel pruning method utilising centrality measures from graph theory, reducing both the computational requirements and the memory footprint of these models. Specifically, we devise a method for creating a weighted directed acyclical graph representation of multilayer perceptrons to which we apply a modified version of the weighted PageRank centrality measure to compute node importance scores. In combination with uniform pruning this leads to structured sparsity. We call this pruning method MLPRank. Furthermore we introduce an extension to decoder-only transformer models and call it LLMRank. For both variants we demonstrate a strong performance. With MLPRank on average leading to 6.09 % higher accuracy retention…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

amazon-science/llm-rank-pruning
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsLarge Language Model pruning based on weighted PageRank · Pruning