An Empirical Investigation of Matrix Factorization Methods for Pre-trained Transformers
Ashim Gupta, Sina Mahdipour Saravani, P. Sadayappan, Vivek Srikumar

TL;DR
This paper compares different matrix factorization techniques for compressing large pre-trained transformers, finding that simple low-rank methods outperform more complex approaches like Monarch in various NLP tasks.
Contribution
It introduces a staged low-rank factorization approach to improve stability and demonstrates that simple low-rank methods outperform Monarch factorization across multiple benchmarks.
Findings
Low-rank factorization outperforms Monarch across tasks.
Staged factorization enhances stability of compression.
Simple methods are more effective than complex ones.
Abstract
The increasing size of transformer-based models in NLP makes the question of compressing them important. In this work, we present a comprehensive analysis of factorization based model compression techniques. Specifically, we focus on comparing straightforward low-rank factorization against the recently introduced Monarch factorization, which exhibits impressive performance preservation on the GLUE benchmark. To mitigate stability issues associated with low-rank factorization of the matrices in pre-trained transformers, we introduce a staged factorization approach wherein layers are factorized one by one instead of being factorized simultaneously. Through this strategy we significantly enhance the stability and reliability of the compression process. Further, we introduce a simple block-wise low-rank factorization method, which has a close relationship to Monarch factorization. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGraph Theory and Algorithms · Parallel Computing and Optimization Techniques · Neural Networks and Applications
MethodsFocus
