Low-Rank Prune-And-Factorize for Language Model Compression

Siyu Ren; Kenny Q. Zhu

arXiv:2306.14152·cs.CL·June 27, 2023·2 cites

Low-Rank Prune-And-Factorize for Language Model Compression

Siyu Ren, Kenny Q. Zhu

PDF

Open Access

TL;DR

This paper introduces a novel approach combining pruning and matrix factorization to effectively compress large language models by exploiting low-rank sparsity patterns, resulting in better performance at high compression rates.

Contribution

It identifies the full-rankness bottleneck in PLMs and proposes sparsity-aware SVD and mixed-rank fine-tuning to improve model compression.

Findings

01

Outperforms existing methods in compression-performance trade-off.

02

Low-rank sparsity patterns are found only in models with first-order pruning.

03

Proposed techniques enhance initialization and training for better compression.

Abstract

The components underpinning PLMs -- large weight matrices -- were shown to bear considerable redundancy. Matrix factorization, a well-established technique from matrix theory, has been utilized to reduce the number of parameters in PLM. However, it fails to retain satisfactory performance under moderate to high compression rate. In this paper, we identify the \textit{full-rankness} of fine-tuned PLM as the fundamental bottleneck for the failure of matrix factorization and explore the use of network pruning to extract low-rank sparsity pattern desirable to matrix factorization. We find such low-rank sparsity pattern exclusively exists in models generated by first-order pruning, which motivates us to unite the two approaches and achieve more effective model compression. We further propose two techniques: sparsity-aware SVD and mixed-rank fine-tuning, which improve the initialization and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Speech and dialogue systems · Natural Language Processing Techniques

MethodsPruning