SLEB: Streamlining LLMs through Redundancy Verification and Elimination   of Transformer Blocks

Jiwon Song; Kyungseok Oh; Taesu Kim; Hyungjun Kim; Yulhwa Kim,; Jae-Joon Kim

arXiv:2402.09025·cs.CL·December 16, 2024·1 cites

SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks

Jiwon Song, Kyungseok Oh, Taesu Kim, Hyungjun Kim, Yulhwa Kim,, Jae-Joon Kim

PDF

Open Access 1 Repo 5 Models

TL;DR

SLEB is a novel method that improves LLM inference speed by pruning redundant transformer blocks, effectively reducing model complexity while maintaining high accuracy and perplexity.

Contribution

This paper introduces SLEB, a new block-level pruning technique that targets redundancy in transformer blocks to accelerate LLM inference.

Findings

01

SLEB outperforms previous pruning methods in speedup.

02

SLEB maintains high perplexity and accuracy.

03

Effective reduction of redundant transformer blocks.

Abstract

Large language models (LLMs) have proven to be highly effective across various natural language processing tasks. However, their large number of parameters poses significant challenges for practical deployment. Pruning, a technique aimed at reducing the size and complexity of LLMs, offers a potential solution by removing redundant components from the network. Despite the promise of pruning, existing methods often struggle to achieve substantial end-to-end LLM inference speedup. In this paper, we introduce SLEB, a novel approach designed to streamline LLMs by eliminating redundant transformer blocks. We choose the transformer block as the fundamental unit for pruning, because LLMs exhibit block-level redundancy with high similarity between the outputs of neighboring blocks. This choice allows us to effectively enhance the processing speed of LLMs. Our experimental results demonstrate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jiwonsong-dev/sleb
noneOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPower Transformer Diagnostics and Insulation · High voltage insulation and dielectric phenomena · Power Systems Fault Detection

MethodsPruning · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings