Numerical Optimizations for Weighted Low-rank Estimation on Language Model
Ting Hua, Yen-Chang Hsu, Felicity Wang, Qian Lou, Yilin Shen, Hongxia, Jin

TL;DR
This paper introduces a weighted low-rank estimation method tailored for language models, addressing the limitations of standard SVD by considering parameter importance, and demonstrates its effectiveness in model compression.
Contribution
It proposes a novel weighted low-rank decomposition approach optimized for language models, improving compression performance over existing methods.
Findings
Outperforms state-of-the-art compression techniques on Transformer models
Provides a metric to predict when SVD causes performance drops
Demonstrates effectiveness through extensive evaluations
Abstract
Singular value decomposition (SVD) is one of the most popular compression methods that approximate a target matrix with smaller matrices. However, standard SVD treats the parameters within the matrix with equal importance, which is a simple but unrealistic assumption. The parameters of a trained neural network model may affect task performance unevenly, which suggests non-equal importance among the parameters. Compared to SVD, the decomposition method aware of parameter importance is the more practical choice in real cases. Unlike standard SVD, weighted value decomposition is a non-convex optimization problem that lacks a closed-form solution. We systematically investigated multiple optimization strategies to tackle the problem and examined our method by compressing Transformer-based language models. Further, we designed a metric to predict when the SVD may introduce a significant…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTensor decomposition and applications · Advanced Neural Network Applications · Medical Image Segmentation Techniques
MethodsAttentive Walk-Aggregating Graph Neural Network
