Lossless Model Compression via Joint Low-Rank Factorization Optimization
Boyang Zhang, Daning Cheng, Yunquan Zhang, Fangming Liu, Jiake Tian

TL;DR
This paper introduces a joint low-rank factorization optimization method that achieves lossless model compression, surpassing original performance without fine-tuning, applicable across vision and language models.
Contribution
It presents a novel joint optimization strategy for low-rank weight factorization that guarantees lossless compression and improved performance, unlike previous separate optimization approaches.
Findings
Achieves 70% compression on ResNext50 with better performance than original
Develops algorithms that do not require fine-tuning for lossless compression
Demonstrates robustness across various vision and language tasks
Abstract
Low-rank factorization is a popular model compression technique that minimizes the error between approximated and original weight matrices. Despite achieving performances close to the original models when is optimized, a performance discrepancy remains due to the separate optimization processes for low-rank factorization and model performance, resulting in unavoidable losses. We address this issue by introducing a novel joint optimization strategy for lossless low-rank weight factorization, which, for the first time, enhances the model's performance beyond the original. Our approach begins with a theoretical analysis of the relationship between low-rank factorization and model optimization objectives, establishing a precise perturbation range for matrix factorization errors on model performance. This challenge is then reformulated as a numerical rank deficiency problem…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Compression Techniques
