Greenformers: Improving Computation and Memory Efficiency in Transformer Models via Low-Rank Approximation
Samuel Cahyawijaya

TL;DR
Greenformers introduces low-rank approximation techniques to enhance transformer model efficiency, reducing computational and environmental costs, especially for short-sequence processing and on-device deployment.
Contribution
The paper proposes Low-Rank Transformer, a novel low-rank factorization method, and compares it with Linformer, demonstrating improved efficiency and reduced costs for transformer models.
Findings
Low-Rank Transformer improves efficiency for short-sequence data.
Linformer is more effective for long-sequence data.
Applying LRT to BERT-base reduces costs by over 30%.
Abstract
In this thesis, we introduce Greenformers, a collection of model efficiency methods to improve the model efficiency of the recently renowned transformer models with a low-rank approximation approach. The development trend of deep learning models tends to results in a more complex and larger model. Although it leads to a better and more accurate prediction, the resulting model becomes even more costly, as it requires weeks of training with a huge amount of GPU resources. Particularly, the size and computational cost of transformer-based models have increased tremendously since its first debut in 2017 from ~100 million parameters up to ~1.6 trillion parameters in early 2021. This computationally hungry model also incurs a substantial cost to the environment and even reaches an alarming level of carbon footprint. Some of these models are so massive that it is even impossible to run the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Image and Signal Denoising Methods · Neural Networks and Applications
MethodsAttention Is All You Need · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Label Smoothing · Softmax · Byte Pair Encoding · Multi-Head Attention · Dropout · Dense Connections · Layer Normalization
