An Empirical Investigation of Matrix Factorization Methods for   Pre-trained Transformers

Ashim Gupta; Sina Mahdipour Saravani; P. Sadayappan; Vivek Srikumar

arXiv:2406.11307·cs.CL·June 18, 2024

An Empirical Investigation of Matrix Factorization Methods for Pre-trained Transformers

Ashim Gupta, Sina Mahdipour Saravani, P. Sadayappan, Vivek Srikumar

PDF

Open Access

TL;DR

This paper compares different matrix factorization techniques for compressing large pre-trained transformers, finding that simple low-rank methods outperform more complex approaches like Monarch in various NLP tasks.

Contribution

It introduces a staged low-rank factorization approach to improve stability and demonstrates that simple low-rank methods outperform Monarch factorization across multiple benchmarks.

Findings

01

Low-rank factorization outperforms Monarch across tasks.

02

Staged factorization enhances stability of compression.

03

Simple methods are more effective than complex ones.

Abstract

The increasing size of transformer-based models in NLP makes the question of compressing them important. In this work, we present a comprehensive analysis of factorization based model compression techniques. Specifically, we focus on comparing straightforward low-rank factorization against the recently introduced Monarch factorization, which exhibits impressive performance preservation on the GLUE benchmark. To mitigate stability issues associated with low-rank factorization of the matrices in pre-trained transformers, we introduce a staged factorization approach wherein layers are factorized one by one instead of being factorized simultaneously. Through this strategy we significantly enhance the stability and reliability of the compression process. Further, we introduce a simple block-wise low-rank factorization method, which has a close relationship to Monarch factorization. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGraph Theory and Algorithms · Parallel Computing and Optimization Techniques · Neural Networks and Applications

MethodsFocus