Doping: A technique for efficient compression of LSTM models using sparse structured additive matrices
Urmish Thakker, Paul N. Whatmough, Zhigang Liu, Matthew Mattina, Jesse, Beu

TL;DR
This paper introduces doping, a novel method that adds sparse matrices to structured matrices for compressing LSTM models efficiently, achieving high compression ratios with minimal accuracy loss.
Contribution
The paper proposes doping and associated regularization techniques to improve structured matrix compression of neural networks, demonstrating state-of-the-art results in NLP tasks.
Findings
Achieves 10-25x compression with minor accuracy loss.
Outperforms pruning and low-rank methods significantly.
Enables hardware-efficient deployment with 2.5-5.5x speed-up.
Abstract
Structured matrices, such as those derived from Kronecker products (KP), are effective at compressing neural networks, but can lead to unacceptable accuracy loss when applied to large models. In this paper, we propose the notion of doping -- addition of an extremely sparse matrix to a structured matrix. Doping facilitates additional degrees of freedom for a small number of parameters, allowing them to independently diverge from the fixed structure. To train LSTMs with doped structured matrices, we introduce the additional parameter matrix while slowly annealing its sparsity level. However, we find that performance degrades as we slowly sparsify the doping matrix, due to co-matrix adaptation (CMA) between the structured and the sparse matrices. We address this over dependence on the sparse matrix using a co-matrix dropout regularization (CMR) scheme. We provide empirical evidence to show…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Machine Learning and ELM · Advanced Neural Network Applications
MethodsPruning · Dropout · Kollen-Pollack Learning
