Training Acceleration of Low-Rank Decomposed Networks using Sequential Freezing and Rank Quantization
Habib Hajimolahoseini, Walid Ahmed, Yang Liu

TL;DR
This paper introduces techniques to accelerate low-rank decomposed neural networks by optimizing ranks and freezing layers sequentially, significantly improving training and inference speed while maintaining accuracy.
Contribution
The paper proposes rank optimization and sequential freezing methods to enhance training acceleration of low-rank decomposed models without reducing decomposition ranks.
Findings
Up to 60% training throughput improvement
Up to 37% inference speedup
Maintains accuracy close to original models
Abstract
Low Rank Decomposition (LRD) is a model compression technique applied to the weight tensors of deep learning models in order to reduce the number of trainable parameters and computational complexity. However, due to high number of new layers added to the architecture after applying LRD, it may not lead to a high training/inference acceleration if the decomposition ranks are not small enough. The issue is that using small ranks increases the risk of significant accuracy drop after decomposition. In this paper, we propose two techniques for accelerating low rank decomposed models without requiring to use small ranks for decomposition. These methods include rank optimization and sequential freezing of decomposed layers. We perform experiments on both convolutional and transformer-based models. Experiments show that these techniques can improve the model throughput up to 60% during training…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTensor decomposition and applications · Computational Physics and Python Applications · Advanced Neural Network Applications
