Language model compression with weighted low-rank factorization

Yen-Chang Hsu; Ting Hua; Sungen Chang; Qian Lou; Yilin Shen; Hongxia; Jin

arXiv:2207.00112·cs.LG·July 4, 2022·20 cites

Language model compression with weighted low-rank factorization

Yen-Chang Hsu, Ting Hua, Sungen Chang, Qian Lou, Yilin Shen, Hongxia, Jin

PDF

Open Access 1 Video

TL;DR

This paper introduces Fisher-Weighted SVD (FWSVD), a novel matrix factorization method for language model compression that aligns the approximation process with task importance, resulting in better performance retention at higher compression rates.

Contribution

The paper proposes FWSVD, which incorporates Fisher information into SVD to prioritize important parameters, improving task accuracy preservation during model compression.

Findings

01

FWSVD maintains higher task accuracy compared to traditional SVD.

02

The method achieves 9-30% parameter reduction with minimal accuracy loss.

03

FWSVD outperforms other compact strategies without requiring extensive pre-training.

Abstract

Factorizing a large matrix into small matrices is a popular strategy for model compression. Singular value decomposition (SVD) plays a vital role in this compression strategy, approximating a learned matrix with fewer parameters. However, SVD minimizes the squared error toward reconstructing the original matrix without gauging the importance of the parameters, potentially giving a larger reconstruction error for those who affect the task accuracy more. In other words, the optimization objective of SVD is not aligned with the trained model's task accuracy. We analyze this previously unexplored problem, make observations, and address it by introducing Fisher information to weigh the importance of parameters affecting the model prediction. This idea leads to our method: Fisher-Weighted SVD (FWSVD). Although the factorized matrices from our approach do not result in smaller reconstruction…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Language model compression with weighted low-rank factorization· slideslive

Taxonomy

TopicsTopic Modeling · Speech Recognition and Synthesis · Natural Language Processing Techniques