Can pruning make Large Language Models more efficient?

Sia Gholami; Marwan Omar

arXiv:2310.04573·cs.LG·October 10, 2023

Can pruning make Large Language Models more efficient?

Sia Gholami, Marwan Omar

PDF

Open Access

TL;DR

This paper explores weight pruning techniques to reduce the size and computational demands of Transformer-based large language models, demonstrating that significant efficiency gains are possible with minimal performance loss.

Contribution

It provides a comprehensive analysis of pruning methodologies for Transformers, showing how to effectively reduce model size while maintaining or improving performance through fine-tuning.

Findings

01

Significant model size reductions achievable with minimal performance loss

02

Pruned models can exhibit enhanced generalization capabilities

03

Effective pruning strategies depend on hyperparameter selection

Abstract

Transformer models have revolutionized natural language processing with their unparalleled ability to grasp complex contextual relationships. However, the vast number of parameters in these models has raised concerns regarding computational efficiency, environmental impact, and deployability on resource-limited platforms. To address these challenges, this paper investigates the application of weight pruning-a strategic reduction of model parameters based on their significance-as an optimization strategy for Transformer architectures. Through extensive experimentation, we explore various pruning methodologies, highlighting their impact on model performance, size, and computational demands. Our findings suggest that with judicious selection of pruning hyperparameters, significant reductions in model size are attainable without considerable compromise on performance. Moreover, when coupled…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Topic Modeling · Natural Language Processing Techniques

MethodsLinear Layer · Multi-Head Attention · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Attention Is All You Need · Dense Connections · Label Smoothing · Adam · Absolute Position Encodings · Residual Connection