Change Is the Only Constant: Dynamic LLM Slicing based on Layer   Redundancy

Razvan-Gabriel Dumitru; Paul-Ioan Clotan; Vikas Yadav; Darius; Peteleaza; Mihai Surdeanu

arXiv:2411.03513·cs.CL·November 7, 2024

Change Is the Only Constant: Dynamic LLM Slicing based on Layer Redundancy

Razvan-Gabriel Dumitru, Paul-Ioan Clotan, Vikas Yadav, Darius, Peteleaza, Mihai Surdeanu

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper presents a dynamic layer-specific pruning method for Large Language Models that uses a new Layer Redundancy score to improve efficiency and performance over traditional static slicing techniques.

Contribution

It introduces a novel dynamic slicing approach based on Layer Redundancy scores, advancing model compression for LLMs beyond existing static methods like SliceGPT.

Findings

01

Performance improved by up to 5% over baseline.

02

Perplexity decreased by up to 7%.

03

Method maintained or enhanced model accuracy.

Abstract

This paper introduces a novel model compression approach through dynamic layer-specific pruning in Large Language Models (LLMs), enhancing the traditional methodology established by SliceGPT. By transitioning from constant to dynamic slicing, our method leverages the newly proposed Layer Redundancy (LR) score, which assesses how much change each layer changes its input by measuring the cosine similarity of the input to the output of the layer. We use this score to prune parts of individual layers based on redundancy in such a way that the average pruned percentage for all layers is a fixed value. We conducted extensive experiments using models like Llama3-8B and Mistral-7B on multiple datasets, evaluating different slicing bases and percentages to determine optimal configurations that balance efficiency and performance. Our findings show that our dynamic slicing approach not only…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

razvandu/dynamicslicing
pytorchOfficial

Videos

Change Is the Only Constant: Dynamic LLM Slicing based on Layer Redundancy· underline

Taxonomy

TopicsNatural Language Processing Techniques · Network Packet Processing and Optimization

MethodsPruning